It’s time for some deep learning!
Deep learning models in Keras are built in layers, where each layer is a step in the model.
Our encoder requires two layer types from Keras:
- An input layer, which defines a matrix to hold all the one-hot vectors that we’ll feed to the model.
- An LSTM layer, with some output dimensionality.
We can import these layers as well as the model we need like so:
from keras.layers import Input, LSTM from keras.models import Model
Next, we set up the input layer, which requires some number of dimensions that we’re providing. In this case, we know that we’re passing in all the encoder tokens, but we don’t necessarily know our batch size (how many chocolate chip cookies sentences we’re feeding the model at a time). Fortunately, we can say None
because the code is written to handle varying batch sizes, so we don’t need to specify that dimension.
# the shape specifies the input matrix sizes encoder_inputs = Input(shape=(None, num_encoder_tokens))
For the LSTM layer, we need to select the dimensionality (the size of the LSTM’s hidden states, which helps determine how closely the model molds itself to the training data — something we can play around with) and whether to return the state (in this case we do):
encoder_lstm = LSTM(100, return_state=True) # we're using a dimensionality of 100 # so any LSTM output matrix will have # shape [batch_size, 100]
Remember, the only thing we want from the encoder is its final states. We can get these by linking our LSTM layer with our input layer:
encoder_outputs, state_hidden, state_cell = encoder_lstm(encoder_inputs)
encoder_outputs
isn’t really important for us, so we can just discard it. However, the states, we’ll save in a list:
encoder_states = [state_hidden, state_cell]
There is a lot to take in here, but there’s no need to memorize any of this — you got this.💪
Instructions
We’ve moved the code from the previous exercises into another file to give you some room (and to speed things up a bit for you).
The necessary modules are imported, so now it’s up to you to set up the encoder layers and retrieve the states. Ready?
First, define an input layer, encoder_inputs
. Give its shape:
- a batch size of
None
- number of tokens set to
num_encoder_tokens
Build the LSTM layer called encoder_lstm
with a dimensionality of 256 that will return the output state.
Call encoder_lstm
on encoder_inputs
to retrieve the following return values:
encoder_outputs
state_hidden
state_cell
Now, create a list of the two states and assign it to a new variable: encoder_states
.