Alright! Let’s get model-building!

First, we define the seq2seq model using the Model() function we imported from Keras. To make it a seq2seq model, we feed it the encoder and decoder inputs, as well as the decoder output:

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

Finally, our model is ready to train. First, we compile everything. Keras models demand two arguments to compile:

  • An optimizer (we’re using RMSprop, which is a fancy version of the widely-used gradient descent) to help minimize our error rate (how bad the model is at guessing the true next word given the previous words in a sentence).
  • A loss function (we’re using the logarithm-based cross-entropy function) to determine the error rate.

Because we care about accuracy, we’re adding that into the metrics to pay attention to while training. Here’s what the compiling code looks like:

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

Next we need to fit the compiled model. To do this, we give the .fit() method the encoder and decoder input data (what we pass into the model), the decoder target data (what we expect the model to return given the data we passed in), and some numbers we can adjust as needed:

  • batch size (smaller batch sizes mean more time, and for some problems, smaller batch sizes will be better, while for other problems, larger batch sizes are better)
  • the number of epochs or cycles of training (more epochs mean a model that is more trained on the dataset, and that the process will take more time)
  • validation split (what percentage of the data should be set aside for validating — and determining when to stop training your model — rather than training)

Keras will take it from here to get you a (hopefully) nicely trained seq2seq model:

model.fit([encoder_input_data, decoder_input_data], decoder_target_data, batch_size=10, epochs=100, validation_split=0.2)



Run the code as is to see a summary of the training model. You may notice the following layers included:

  • two input layers (one for the encoder, one for the decoder)
  • two LSTM layers (one for the encoder, one for the decoder)
  • one Dense layer (for the decoder)

Next up, we need to compile the model. Below the training setup and where the model is built, compile training_model using the RMSprop optimizer, categorical cross-entropy loss, and metrics of accuracy.

Please note: for the remainder of this lesson, running the code may take a bit more time! Keras requires more time and resources than other libraries you have worked with here.


Fitting time! Because we don’t want to crash this exercise, we’ll make the batch size large and the number of epochs very small. (Note that small batch sizes are more prone to crashing a deep learning program in general, but in our case we care about time.)

We’ve provided the batch_size and epochs variables in script.py. Update them as follows:

  • batch_size to 50
  • epochs to 50

Fit training_model with epochs set to epochs, batch_size set to batch_size, and a validation_split of 0.2.

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?