Codecademy Logo

Image Classification

Image Classification

Image classification involves finding the complex patterns in pixels necessary to map an image to its label and is a common application of deep learning.

To preprocess image data, we can use an ImageDataGenerator() from the TensorFlow library. We can augment our image data using parameters such as zoom_range and width_shift_range, among others.

To load in image data, we can use the flow_from_directory() method from the TensorFlow library to import our image directory and specify parameters, such as class_mode and color_mode.

from tensorflow.keras.preprocessing.image import ImageDataGenerator
#Preprocessing image data
training_data_generator = ImageDataGenerator(
rescale=1.0/255,
#Randomly increase or decrease the size of the image by up to 20%
zoom_range=0.2)
#Creates a DirectoryIterator object using the above parameters:
#Loading in image data
training_iterator = training_data_generator.flow_from_directory("data/train", class_mode="categorical", color_mode="rgb", target_size=(256,256), batch_size=8)

Convolutional Neural Network

In deep learning models, convolutional neural networks (CNNs) use layers specifically designed for image data that capture local relationships between nearby features in an image.

When we use a convolutional layer, we learn a set of smaller weight tensors, called filters (also known as kernels). We move each of these filters (i.e. convolve them) across the height and width of our input, to generate a new “image” of features. Each new “pixel” results from applying the filter to that location in the original image.

Convolution-based approaches work well for image data for the following reasons:

  • Convolution can reduce the size of an input image using only a few parameters.
  • Filters compute new features by only combining features that are near each other in the image. This operation encourages the model to look for local patterns (e.g., edges and objects).
  • Convolutional layers will produce similar outputs even when the objects in an image are translated (for example, if there were a giraffe in the bottom or top of the frame). This is because the same filters are applied across the entire image.
import tensorflow as tf
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(256, 256, 1)))
#Adds a Conv2D layer with 8 filters, each size 3x3:
model.add(tf.keras.layers.Conv2D(16, 7,activation=""relu""))
model.summary()

Stride Hyperparameter

The stride hyperparameter in a convolutional layer is how much we move the filter each time we apply it. In the example above, stride=2 so the filter moves across two columns each time.

Convolution with strides of two. The stick figure image is convolved by a three by three filter with ones in its four corners and middle. However, each time, the filter is moved two columns over. When it gets to the right edge of the image, it moves two rows down.

Padding Hyperparameter

The padding hyperparameter defines what is done once the filter gets to the end of a row/column. In other words: “what happens when we run out of image?” There are two main methods for what to do here:

  • Valid Padding: The default option is to just stop when our kernel moves off the image.
  • Same Padding: Another option is to pad the input by surrounding it with zeros.
Image with padding. In the image, we have a 7x7 grid of numbers surrounded by an edge of zeros to make it 9x9. The 3x3 filter can now be applied to the entire image, and the resulting output is the same height and width as the input.

Max Pooling

When building out a convolutional neural network, pooling layers are often used because they reduce the dimensionality of intermediate convolutional outputs.There are many different types of pooling layer, but the most common is called Max pooling:

  • Like in convolution, we move windows of specified size across our input. Stride and padding can be specified in a max pooling layer.
  • However, instead of multiplying each image patch by a filter, the patch is replaced with its maximum value.
A sample output of a convolutional layer is passed to a max pooling layer. A two by two box is moved over the image with strides of two. The maximum pixel in the box is selected for the output. An arrow goes to a new image, containing the max of each "patch" to which the max pooling layer was applied.

Feature Maps

In a convolutional neural network, feature maps are the result of convolving a single filter across our input, and they provide a way to visualize a model’s internal workings. They allow us to see how our network responds to a particular image in ways that are not always apparent when we only examine the raw filter weights.

Two example filters are displayed. Darker squares correspond to more negative weights, while whiter squares are more positive ones.

Two filters, side by side. The images are very pixelated, but have darker patched on the right.

Learn more on Codecademy