Codecademy Logo

Intro to LLMs

LLM Parameter: Temperature

  • Large Language Models (LLMs) have many parameters like temperature, top_p, etc, that allow the users to tweaked probabilities to alter their outputs. Temperature controls for how deterministic the LLM outputs are.
  • Increasing temperature widens the probability distribution of outputs that a LLM can draw from given a prompt. This means that it can choose less likely outcomes.
  • High temperature thus corresponds to less deterministic outcomes - i.e., multiple runs with the same prompt will give different outcomes!
  • While low temperature outcomes can give us outcomes that are more likely, there is no guarantee that they are more accurate.

History of LLMs and AI

Three defining moments in the history of Artificial Intelligence are:

  • the Turing Test (1950) which is a test of a machine’s ability to imitate human intelligence
  • the Dartmouth Summer Research Project (1956) which sought to define what AI is
  • the creation of ELIZA (1964), the first digital chatbot by Joseph Weizenbaum.

NLP Tasks

The field of Natural Language Processing (NLP) involves finding mathematical representations of language to capture statistical regularities in text. Some well-known language-related tasks that NLP algorithms are involved in are

  • translation (like in Google Translate)
  • completing letter sequences (autocomplete, for instance)
  • question-answer (customer service chatbots, for example)
  • sentiment analysis (used in content filters)

Tokens in LLMs

There are different ways of mathematically representing text depending on the smallest unit of a sequence one chooses to model. This unit can be a letter, a word or a sequence of words, also known as “tokens”.

Some terms associated with Language Models

Some definitions around language models:

  • Autoregressive language models are models that are trained on a corpus of text and use word representations to predict the next best thing to say based on the underlying distribution of words.

  • A count-based language model is the simplest approach to building an autoregressive language model. It involves creating a giant lookup table of words from a text with their location and frequency stored in it. This lookup table is then used to calculate the probabilities of the next best thing to say.

  • Neural language models map text onto a mathematical representation using neural networks such that text that occurs together or has similar meaning is encoded to representations that exist nearby.

Limitations of count-based language models

The “counting words” approach to language models runs into two issues: the curse of dimensionality and lack of generalizability.

  • The curse of dimensionality refers to the issue of how hard it is for the model to scale
  • Lack of generalizability refers to the inability to create sequences that haven’t exactly appeared in text.

Neural Language Models

Language models today use neural networks and do not attempt to learn the exact distribution of words in a corpus of text. Rather they learn an approximate distribution in a computationally effective manner.

  • Language models that use neural networks are able to generalize to unseen instances in the text. Because they rely on the underlying semantic representations, they can assign non-zero probabilities to text they haven’t been exposed to.

  • Language models that use neural networks are effective in mitigating the curse of dimensionality as they compress the text they’re trained on to a smaller set of parameters. This means that they can assign zero probabilities at times to text that exists in the training corpus.

Generalization and Compression

  • Generalization and compression go hand-in-hand in language models and is what makes them computationally efficient and gives them the ability to generalize.
  • The downside to compression is that there is information loss, i.e., language models can miss details in a text corpus.
  • The downside to generalization is that language models can sometimes make up unverifiable text that has no factual grounding, something that is referred to as “hallucinations”.

Learn more on Codecademy