Image classification involves finding the complex patterns in pixels necessary to map an image to its label and is a common application of deep learning.
To preprocess image data, we can use an
ImageDataGenerator() from the TensorFlow library. We can augment our image data using parameters such as
width_shift_range, among others.
To load in image data, we can use the
flow_from_directory() method from the TensorFlow library to import our image directory and specify parameters, such as
from tensorflow.keras.preprocessing.image import ImageDataGenerator #Preprocessing image data training_data_generator = ImageDataGenerator( rescale=1.0/255, #Randomly increase or decrease the size of the image by up to 20% zoom_range=0.2) #Creates a DirectoryIterator object using the above parameters: #Loading in image data training_iterator = training_data_generator.flow_from_directory("data/train", class_mode="categorical", color_mode="rgb", target_size=(256,256), batch_size=8)
Convolutional Neural Network
In deep learning models, convolutional neural networks (CNNs) use layers specifically designed for image data that capture local relationships between nearby features in an image.
When we use a convolutional layer, we learn a set of smaller weight tensors, called filters (also known as kernels). We move each of these filters (i.e. convolve them) across the height and width of our input, to generate a new “image” of features. Each new “pixel” results from applying the filter to that location in the original image.
Convolution-based approaches work well for image data for the following reasons:
- Convolution can reduce the size of an input image using only a few parameters.
- Filters compute new features by only combining features that are near each other in the image. This operation encourages the model to look for local patterns (e.g., edges and objects).
- Convolutional layers will produce similar outputs even when the objects in an image are translated (for example, if there were a giraffe in the bottom or top of the frame). This is because the same filters are applied across the entire image.
import tensorflow as tf model = tf.keras.Sequential() model.add(tf.keras.Input(shape=(256, 256, 1))) #Adds a Conv2D layer with 8 filters, each size 3x3: model.add(tf.keras.layers.Conv2D(16, 7,activation=""relu"")) model.summary()
The stride hyperparameter in a convolutional layer is how much we move the filter each time we apply it. In the example above,
stride=2 so the filter moves across two columns each time.
The padding hyperparameter defines what is done once the filter gets to the end of a row/column. In other words: “what happens when we run out of image?” There are two main methods for what to do here:
- Valid Padding: The default option is to just stop when our kernel moves off the image.
- Same Padding: Another option is to pad the input by surrounding it with zeros.
When building out a convolutional neural network, pooling layers are often used because they reduce the dimensionality of intermediate convolutional outputs.There are many different types of pooling layer, but the most common is called Max pooling:
- Like in convolution, we move windows of specified size across our input. Stride and padding can be specified in a max pooling layer.
- However, instead of multiplying each image patch by a filter, the patch is replaced with its maximum value.
In a convolutional neural network, feature maps are the result of convolving a single filter across our input, and they provide a way to visualize a model’s internal workings. They allow us to see how our network responds to a particular image in ways that are not always apparent when we only examine the raw filter weights.
Two example filters are displayed. Darker squares correspond to more negative weights, while whiter squares are more positive ones.