GPT stands for “Generative Pre-trained Transformer”. Transformers are a specific type of neural network architecture developed in 2017 that the model uses in implementing pretraining and finetuning.
Transformer models are data agnostic, i.e., they can work with different training data types such as text, image, video, audio and protein sequences.
The size of transformer models have grown exponentially since their inception in 2017 as their performance scales with the size of training data. The earlier models were of the order of a few billion parameters and current models are 100 times bigger.
Transformer models can be grouped into three main categories based on their architecture:
Traditional language models were count-based and struggled to capture long-range dependencies in text. Neural language models based on RNNs and LSTMs were the first step towards solving this prior to transformers.