When working with nominal categorical variables in Python, it can be useful to use One-Hot Encoding, which is a technique that will effectively create binary variables for each of the nominal categories. This encodes the variable without creating an order among the categories. To one-hot encode a variable in a pandas dataframe, we can use the .get_dummies()
.
df = pd.get_dummies(data = df, columns= ['column1', 'column2')
Before diving into your deep learning, it is best practice to investigate your dataset to get acquainted with the features, size, and structure of the information you are working with. You can investigate your data with pandas, using properties such as .shape
and methods like .describe()
.
Neural networks cannot work with string data. Therefore, if upon inspection you find that your data contains strings, you can use one hot encoding to convert categorical features into numerical features. An example of this is pictured below. To do this in Python, you can use the .get_dummies()
pandas method.
#load the datasetdataset = pd.read_csv('dataset.csv')#choose first 7 columns as featuresfeatures = dataset.iloc[:,0:6]#choose the final column for predictionlabels = dataset.iloc[:,-1]#see useful summary statistics for numeric featuresprint(features.describe())#shape and summary statistics of labelsprint(labels.shape)print(labels.describe())# use one hot encodingnumerical_features = pd.get_dummies(features)
When training a deep learning model (or any other machine learning model), split your data into train and test sets. The train set is used during the learning process, while the test set is used to evaluate the results of your model.
To perform this in Python, we use the train_test_split()
method from the scikit-learn library.
from sklearn.model_selection import train_test_split# Here we chose the test size to be 33% of the total data, and random state controls the shuffling applied to the data before applying the split.features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.33, random_state=42)
When preprocessing our data, we want to make sure all our features have similar scales. This is because deep learning models (like all learning models) perform better if all our features are weighed equally. Standardization and normalization are both common scaling methods.
Standardization scales all the features to have a mean of zero and a unit variance (equal to one). Normalization scales all the features to be in a fixed range, normally between 0
and 1
. Both are viable options when getting your data prepared for the learning process.
# Standardization can be implemented in the following way with scikit-learn:from sklearn.preprocessing import StandardScalerfrom sklearn.compose import ColumnTransformerct = ColumnTransformer([(“scale”, StandardScaler(), ['age', 'bmi', 'children'])], remainder='passthrough')features_train = ct.fit_transform(features_train)features_test = ct.transform(features_test)
A sequential deep learning model is a linear stack of layers with one input layer where data enters the neural network and one ouput layer where data exits the neural network. These stacked layers each contain at least one neuron, and they are the building blocks of our neural networks.
Here is an example layer diagram with three neurons. The W
and b
labels in the diagram represent weights and bias.
from tensorflow.keras.models import Sequentialfrom tensorflow.keras import layers# initializing a sequential modelmodel = Sequential()# creating a layer with 3 neuronslayer = layers.Dense(3)
When compiling a deep learning model, loss is measured to evaluate the success of the results. A lower loss means better performance. Since the goal is to achieve the best performance possible (without overfitting or underfitting), optimizers are used to continuously update the weights and parameters and improve loss metrics.
In the case of regression, the most often used loss function is the Mean Squared Error mse
(the average squared difference between the estimated values and the actual value).
Additionally, we want to observe the progress of the Mean Absolute Error (mae
) while training the model because MAE can give us a better idea than mse
on how far off we are from the true values in the units we are predicting.
# compiling our deep learning model with the following parameters:# mean squared error as the loss function# mean average error as the metric# Adam as the optimizer -- a widely used oneopt = Adam(learning_rate = 0.01)my_model.compile(loss='mse', metrics=['mae'], optimizer=opt)
Once a deep learning model is compiled, it is time to fit it to the training data and evaluate it on the test data. Using the .fit()
scikit-learn method on the training data, we specify the following parameters:
epochs
which is the number of cycles through the full training datasetbatch_size
which is the number of data points to work through before updating the model parametersAfter we fit the model, we evaluate it using the .evaluate()
scikit-learn method on the test set of data.
# fiting our modelmy_model.fit(train_data, train_labels, epochs=50, batch_size=3, verbose=1)# evaluating our modelval_mse, val_mae = my_model.evaluate(test_data, test_labels, verbose = 0)
In a sequential deep learning model, we have three different types of layers:
There is always only one input and output layer, while there can be as many hidden layers as desired (even zero). Together, all these layers create neural networks like the one shown here:
from tensorflow.keras.layers import InputLayerfrom tensorflow.keras.layers import Densefrom tensorflow.keras.models import Sequentialmy_model = Sequential()# adding an input layer for a dataframe with 15 columnsmy_model.add(InputLayer(input_shape=(15,)))# hidden layer with 64 neurons and relu activation functionmy_model.add(Dense(64, activation='relu'))# adding an output layer to our modelmy_model.add(Dense(1))
After training and evaluating a neural network model, one must start the process of hyperparameter tuning, which involves tweaking hyperparameter values to continuously improve results.
In the image, you’ll see how we use the three datasets and our hyperparameters to adjust and evaluate our model’s performance:
When going through the process of hyperparamter tuning, there are several common parameters to adjust:
Tuning these hyperparameters is key to strong model performance. Making slight changes to them can alter performance in major ways, so hyperparameter tuning is often the longest process of building a model.
While in the process of hyperparameter tuning for a deep learning model, a good rule of thumb is to start by adding one hidden layer and add as many parameters as there are features existing in the dataset.
To avoid overfitting in a deep learning model, one can specify early stopping in TensorFlow with Keras by creating an EarlyStopping
callback and adding it as a parameter when we fit our model. An implementation of EarlyStopping
is shown with the following:
monitor = val_loss
, which means we are monitoring the validation loss to decide when to stop the trainingmode = min
, which means we seek minimal losspatience = 40
, which means that if the learning reaches a plateau, it will continue for 40 more epochs in case the plateau leads to improved performancefrom tensorflow.keras.callbacks import EarlyStoppingstop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=40)history = model.fit(features_train, labels_train, epochs=num_epochs, batch_size=16, verbose=0, validation_split=0.2, callbacks=[stop])
When tuning a deep learning model, one can use grid search, also called exhaustive search, to try every combination of desired hyperparameter values.
If, for example, we want to try learning rates of 0.01 and 0.001 and batch sizes of 10, 30, and 50, grid search will try six combinations of parameters (0.01 and 10, 0.01 and 30, 0.01 and 50, 0.001 and 10, and so on).
To implement this in Python, we use GridSearchCV
from scikit-learn. For regression, we need to first wrap our neural network model into a KerasRegressor
. Then, we need to setup the desired hyperparameters grid (we don’t use many values for the sake of speed). Finally, we initialize a GridSearchCV
object and fit our model to the data. The implementation of this is shown in the code snippet.
model = KerasRegressor(build_fn=design_model)# batch sizes and epochs to testbatch_size = [4, 8, 16, 64]epochs = [10, 50, 100, 200]# setting up our grid of parametersparam_grid = dict(batch_size=batch_size, epochs=epochs)# initiliazing a grid searchgrid = GridSearchCV(estimator = model, param_grid=param_grid, scoring = make_scorer(mean_squared_error, greater_is_better=False))# fitting the resultsgrid_result = grid.fit(features_train, labels_train, verbose = 0)
When tuning a deep learning model, one can use random search to go through random combinations of hyperparameters over a specific interval.
Randomized search will sample values for batch_size
and nb_epoch
from uniform distributions on specified intervals. For example, in the code snippet shown, we sample random batch sizes in the interval [2, 16] and random epoch sizes in the interval [10, 100], respectively, for a fixed number of iterations. In our case, 12 iterations:
# parameter grid with batch sizes between 2 and 16, and epochs between 10 and 100param_grid = {'batch_size': sp_randint(2, 16), 'nb_epoch': sp_randint(10, 100)}# initializing random search# score is using mse as the metric and looking for lower scores# 12 iterationsgrid = RandomizedSearchCV(estimator = model, param_distributions=param_grid, scoring = make_scorer(mean_squared_error, greater_is_better=False), n_iter = 12)
Regularization is a set of techniques that help avoid overfitting by preventing the learning process from fitting a deep learning model completely.
Dropout is a regularization technique that randomly ignores, or “drops out”, a number of outputs of a layer by setting them to zeros.
The dropout rate is the percentage of layer outputs set to zero (usually between 20% to 50%). In Keras, we can add a dropout layer by introducing the Dropout
layer.
# A model with two dropout layers# setting up model and input layermodel = Sequential()my_input = tf.keras.Input(shape=(20,))model.add(my_input)model.add(layers.Dense(128, activation='relu'))# dropout layer with dropout rate of 0.1model.add(layers.Dropout(0.1))model.add(layers.Dense(64, activation='relu'))# dropout layer with dropout rate of 0.2model.add(layers.Dropout(0.2))model.add(layers.Dense(24, activation='relu'))