Foundation models power AI systems that are trained on text, images, code, and other formats of data. These models pre-train on large amounts of unlabeled data from the internet. They can then be fine-tuned to perform various downstream tasks, like generating content (referred to as generative AI (GenAI)).
Foundation models are capable of powering downstream tasks that involve language, vision, robotics, reasoning, and human interactions.
Five attributes for foundation models include:
- Expressivity: Can capture information and present it with realism.
- Scalability: Can handle large amounts of high-dimensional data.
- Mutlimodality: Can consume, process, and create content from many sources and domains.
- Memory: Can store and access knowledge.
- Compositionality: Can generalize.
Large amounts of data are passed to foundation models to improve the ability to perform downstream tasks.
Due to the nature of the data that models train on, biases and stereotypes pose a serious risk. Malicious actors may spread disinformation using AI systems to generate exaggerated content. Misinformation is also a possibility due to artificial hallucinations. Therefore, it’s important to train models on data from trusted sources to minimize these risks.
Foundation models use various architecture types. One of which includes the transformer architecture (like evolved transformer or vanilla transformer).
Foundation models are based on deep learning standards and transfer learning.
Pre-training models on data is vital in transfer learning, which deals with the application of knowledge learned from one task onto another task. As computer hardware increases in speed and memory, the amount of training data that is able to be processed increases as well.
Some examples of existing foundation models include:
- BERT (Bidirectional Encoder Representations from Transformers) is a model introduced by Google in 2018.
- LaMDA (Language Model for Dialogue Applications) is a model introduced by Google in 2020, and powers Bard which was released in March 2023.
- GPT (Generative Pre-trained Transformer) is a model that was first introduced by OpenAI in 2018. It is trained on text and code, and powers ChatGPT which was released in November 2022.
- DALL-E is a model introduced in 2021 by OpenAI. It is based on GPT-3 and is used to produce images.
Foundation models are a fundamental part of an AI system. Below are some of the types of models:
- Diffusion Models
- Diffusion Models are generative models, which means they are used to generate data similar to what they were trained on. The models work by destroying training data through the addition of Gaussian noise, and then learning to recover that data.
- Large Language Models (LLMs)
- Large Language Models are artificial intelligence systems that are designed to process and generate human language on a massive scale.