Image classification involves finding the complex patterns in pixels necessary to map an image to its label and is a common application of deep learning.
To preprocess image data, we can use an ImageDataGenerator()
from the TensorFlow library. We can augment our image data using parameters such as zoom_range
and width_shift_range
, among others.
To load in image data, we can use the flow_from_directory()
method from the TensorFlow library to import our image directory and specify parameters, such as class_mode
and color_mode
.
from tensorflow.keras.preprocessing.image import ImageDataGenerator#Preprocessing image datatraining_data_generator = ImageDataGenerator(rescale=1.0/255,#Randomly increase or decrease the size of the image by up to 20%zoom_range=0.2)#Creates a DirectoryIterator object using the above parameters:#Loading in image datatraining_iterator = training_data_generator.flow_from_directory("data/train", class_mode="categorical", color_mode="rgb", target_size=(256,256), batch_size=8)
In deep learning models, convolutional neural networks (CNNs) use layers specifically designed for image data that capture local relationships between nearby features in an image.
When we use a convolutional layer, we learn a set of smaller weight tensors, called filters (also known as kernels). We move each of these filters (i.e. convolve them) across the height and width of our input, to generate a new “image” of features. Each new “pixel” results from applying the filter to that location in the original image.
Convolution-based approaches work well for image data for the following reasons:
import tensorflow as tfmodel = tf.keras.Sequential()model.add(tf.keras.Input(shape=(256, 256, 1)))#Adds a Conv2D layer with 8 filters, each size 3x3:model.add(tf.keras.layers.Conv2D(16, 7,activation=""relu""))model.summary()
The stride hyperparameter in a convolutional layer is how much we move the filter each time we apply it. In the example above, stride=2
so the filter moves across two columns each time.
The padding hyperparameter defines what is done once the filter gets to the end of a row/column. In other words: “what happens when we run out of image?” There are two main methods for what to do here:
When building out a convolutional neural network, pooling layers are often used because they reduce the dimensionality of intermediate convolutional outputs.There are many different types of pooling layer, but the most common is called Max pooling:
In a convolutional neural network, feature maps are the result of convolving a single filter across our input, and they provide a way to visualize a model’s internal workings. They allow us to see how our network responds to a particular image in ways that are not always apparent when we only examine the raw filter weights.
Two example filters are displayed. Darker squares correspond to more negative weights, while whiter squares are more positive ones.