LSTMs are pretty extraordinary, but they’re only the tip of the iceberg when it comes to actually setting up and running a neural language model for text generation. In fact, an LSTM is usually just a single component in a larger network.
One of the most common neural models used for text generation is the sequence-to-sequence model, commonly referred to as seq2seq (pronounced “seek-to-seek”). A type of encoder-decoder model, seq2seq uses recurrent neural networks (RNNs) like LSTM in order to generate output, token by token or character by character.
So, where does seq2seq show up?
- Machine translation software like Google Translate
- Text summary generation
- Named Entity Recognition (NER)
- Speech recognition
seq2seq networks have two parts:
An encoder that accepts language (or audio or video) input. The output matrix of the encoder is discarded, but its state is preserved as a vector.
A decoder that takes the encoder’s final state (or memory) as its initial state. We use a technique called “teacher forcing” to train the decoder to predict the following text (characters or words) in a target sequence given the previous text.
Take a look at the gif as “Knowledge is power” is translated from Hindi to English. Watch as state is passed through the encoder and on to each layer of the decoder.