Articles

Rectified Linear Unit (ReLU) Function in Deep Learning

Learn how the rectified linear unit (ReLU) function works, how to implement it in Python, and its variations, advantages, and disadvantages.

We use activation functions in deep learning to capture non-linearity in the training data. Depending on the use case, we use different activation functions, such as sigmoid, softmax, tanh, and rectified linear unit (ReLU). Among these functions, ReLU is the most popular activation function for building deep neural networks due to its simplicity, faster training, and ability to avoid the vanishing gradient problem.

In this article, we will discuss the definition, Python implementation, and applications of the ReLU activation function. We will also discuss different variations of the ReLU function and their advantages and disadvantages.

What is the rectified linear unit (ReLU) function?

ReLU is a piecewise linear function that converts its inputs to non-negative outputs. It returns the input without any change for non-negative input values. For negative values, ReLU returns 0. Mathematically, we can define the ReLU function as shown below:

f(x)={x,if x00,otherwisef(x) = \begin{cases} x, & \text{if } x \geq 0 \\ 0, & \text{otherwise} \end{cases}

As the ReLU function returns the maximum value between the input and zero, you can also write the function definition as follows:

f(x)=max(0,x)f(x)=max(0,x)

When we pass a positive value, say 7.2, to the ReLU function, it returns 7.2. On the other hand, if we pass a negative value, say -5.5, to ReLU, we get zero as the output. If we plot the rectified linear unit function for values ranging from -10 to 10, we get the following graph:

ReLU Graph

As you can see, the output from the ReLU function is always zero for negative inputs. Otherwise, it remains the same as the input.

Now that we know the mathematical definition, let’s implement the ReLU function in Python.

Related Course

Learn Intermediate Java: Input and Output

This course shows how programmers can code a Java program that considers, interprets, and responds to input and output.Try it for free

Implementing the ReLU activation function in Python

To implement ReLU in Python, we need to write a function that takes a list or a numpy array as its input. After execution, the function should return a modified list or numpy array where all the negative values are replaced by zero. For this, we will first write a function that takes a single value as its input and returns the output according to the mathematical definition of the ReLU function.

def relu_function(x):
if x<=0:
return 0.
return x
print("ReLU output for input 7.2 is:",relu_function(7.2))
print("ReLU output for input -5.5 is:",relu_function(-5.5))

Output:

ReLU output for input 7.2 is: 7.2
ReLU output for input -5.5 is: 0.0

As we need the ReLU function to take a list or numpy array as its input and process its elements, we will vectorize the above function using the vectorize() function defined in the numpy module. Vectorization allows us to apply the ReLU operation to all the elements in the input list or array without iterating over the input.

import numpy as np
relu = np.vectorize(relu_function)

As we now have the vectorized relu function, we can apply the ReLU operation on 1D and 2D Python lists:

import numpy as np
def relu_function(x):
if x<=0:
return 0.
return x
relu = np.vectorize(relu_function)
list_1d = [0,4.1,2.2,5,-1,6.5,-2.8,5,-7.3,8]
list_2d = [[0,4.1,2.2],[5,-1,6.5],[-2.8,5,-7.3]]
relu_output_1d=relu(list_1d)
relu_output_2d=relu(list_2d)
print("The input 1D list is:",list_1d)
print("The output 1D list is:",relu_output_1d)
print("The input 2D list is:\n",list_2d)
print("The output 2D list is:\n",relu_output_2d)

Output:

The input 1D list is: [0, 4.1, 2.2, 5, -1, 6.5, -2.8, 5, -7.3, 8]
The output 1D list is: [0. 4.1 2.2 5. 0. 6.5 0. 5. 0. 8. ]
The input 2D list is:
[[0, 4.1, 2.2], [5, -1, 6.5], [-2.8, 5, -7.3]]
The output 2D list is:
[[0. 4.1 2.2]
[5. 0. 6.5]
[0. 5. 0. ]]

In the outputs, you can observe that the negative values in the lists are replaced by zero, whereas the positive values aren’t modified.

We can also apply the relu() function on 1D and 2D numpy arrays, as shown below:

array_1d = np.array([0,4.1,2.2,5,-1,6.5,-2.8,5,-7.3,8])
array_2d = np.array([[0,4.1,2.2],[5,-1,6.5],[-2.8,5,-7.3]])
relu_output_1d = relu(array_1d)
relu_output_2d = relu(array_2d)
print("The input 1D array is:",array_1d)
print("The output 1D array is:",relu_output_1d)
print("The input 2D array is:\n",array_2d)
print("The output 2D array is:\n",relu_output_2d)

Output:

The input 1D array is: [ 0. 4.1 2.2 5. -1. 6.5 -2.8 5. -7.3 8. ]
The output 1D array is: [0. 4.1 2.2 5. 0. 6.5 0. 5. 0. 8. ]
The input 2D array is:
[[ 0. 4.1 2.2]
[ 5. -1. 6.5]
[-2.8 5. -7.3]]
The output 2D array is:
[[0. 4.1 2.2]
[5. 0. 6.5]
[0. 5. 0. ]]

ReLU function in PyTorch and TensorFlow

Deep learning frameworks like PyTorch and TensorFlow also provide built-in implementations for the ReLU function. You can apply the ReLU function to PyTorch tensors, as shown in the following example:

from torch import nn, from_numpy
relu=nn.ReLU()
# Create PyTorch tensors from numpy arrays
pytorch_tensor_1d=from_numpy(array_1d)
pytorch_tensor_2d=from_numpy(array_2d)
# Apply ReLU function on the tensors
relu_output_1d = relu(pytorch_tensor_1d)
relu_output_2d = relu(pytorch_tensor_2d)
print("The input 1D tensor is:",pytorch_tensor_1d)
print("The output 1D tensor is:",relu_output_1d)
print("The input 2D tensor is:\n",pytorch_tensor_2d)
print("The output 2D tensor is:\n",relu_output_2d)

Output:

The input 1D tensor is: tensor([ 0.0000, 4.1000, 2.2000, 5.0000, -1.0000, 6.5000, -2.8000, 5.0000,
-7.3000, 8.0000], dtype=torch.float64)
The output 1D tensor is: tensor([0.0000, 4.1000, 2.2000, 5.0000, 0.0000, 6.5000, 0.0000, 5.0000, 0.0000,
8.0000], dtype=torch.float64)
The input 2D tensor is:
tensor([[ 0.0000, 4.1000, 2.2000],
[ 5.0000, -1.0000, 6.5000],
[-2.8000, 5.0000, -7.3000]], dtype=torch.float64)
The output 2D tensor is:
tensor([[0.0000, 4.1000, 2.2000],
[5.0000, 0.0000, 6.5000],
[0.0000, 5.0000, 0.0000]], dtype=torch.float64)

Similarly, you can use the relu() function from the tensorflow.nn module with TensorFlow tensors.

from tensorflow.nn import relu
from tensorflow import convert_to_tensor
#Create TensorFlow tensors from numpy arrays
tf_tensor_1d=convert_to_tensor(array_1d)
tf_tensor_2d=convert_to_tensor(array_2d)
# Apply ReLU function on the tensors
relu_output_1d = relu(tf_tensor_1d)
relu_output_2d = relu(tf_tensor_2d)
print("The input 1D tensor is:",tf_tensor_1d)
print("The output 1D tensor is:",relu_output_1d)
print("The input 2D tensor is:\n",tf_tensor_2d)
print("The output 2D tensor is:\n",relu_output_2d)

Output:

The input 1D tensor is: tf.Tensor([ 0. 4.1 2.2 5. -1. 6.5 -2.8 5. -7.3 8. ], shape=(10,), dtype=float64)
The output 1D tensor is: tf.Tensor([0. 4.1 2.2 5. 0. 6.5 0. 5. 0. 8. ], shape=(10,), dtype=float64)
The input 2D tensor is:
tf.Tensor(
[[ 0. 4.1 2.2]
[ 5. -1. 6.5]
[-2.8 5. -7.3]], shape=(3, 3), dtype=float64)
The output 2D tensor is:
tf.Tensor(
[[0. 4.1 2.2]
[5. 0. 6.5]
[0. 5. 0. ]], shape=(3, 3), dtype=float64)

Applications of the ReLU function in deep learning

In deep learning applications, we use the ReLU activation function to define neural network layers. Let’s discuss using ReLU in TensorFlow and PyTorch to define neural network layers.

Create neural network layers in TensorFlow keras using the ReLU activation function

In TensorFlow 2.x, we can define neural network layers with the ReLU activation function by setting the activation parameter to relu in the functions like Dense(), Conv2D(), SimpleRNN(), and LSTM().

from tensorflow.keras import layers
# Dense layer with relu activation
dense_layer = layers.Dense(64, activation='relu')
# Convolutional layer using relu activation
cnn_layer = layers.Conv2D(32, (2, 2) , activation = 'relu', input_shape = (28, 28, 1))
# RNN using relu activation
rnn_layer = layers.SimpleRNN(64, activation='relu', input_shape=(10, 50))
# LSTM using relu activation
lstm_layer = layers.LSTM(64, activation='relu', input_shape=(10, 50))

Create neural network layers in PyTorch using the ReLU activation function

In PyTorch, we use the forward() method to define the flow of inputs in a neural network model. To apply the ReLU activation function on a neural network layer, you can pass the layer to the torch.nn.functional.relu function in the forward() method.

from torch import nn
from torch.nn.functional import relu
class ReluModel(nn.Module):
def __init__(self):
super().__init__()
self.layer_1 = nn.Linear(100, 50)
def forward(self, x):
# Apply relu on the output of layer_1
x = relu(self. layer_1(x))
return x

Instead of the torch.nn.functional.relu function, we can also use the nn.ReLU class to define the ReLU activation function and use it in the forward() method, as shown below:

from torch import nn
class ReluModel(nn.Module):
def __init__(self):
super().__init__()
# Define a fully connected layer
self.layer_1 = nn.Linear(100, 50)
# Define ReLU layer
self.relu = nn.ReLU()
def forward(self, x):
# Pass output of layer_1 to relu
x = self.relu(self.layer_1(x))
return x

When there are multiple layers in the model, you can pass the output of a layer to the relu() function to apply ReLU activation to the respective layer in the model.

from torch import nn
class ReluModel(nn.Module):
def __init__(self):
super().__init__()
# Define convolutional layers
self.conv_layer1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1)
self.conv_layer2 = nn.Conv2d(32, 64, 3, padding=1)
# Define fully connected layers
self.layer_1 = nn.Linear(64 * 7 * 7, 128)
self.layer_2 = nn.Linear(128, 10)
# Define ReLU layer
self.relu = nn.ReLU()
def forward(self, x):
x = self.relu(self.conv_layer1(x)) # Apply ReLU on the first convolutional layer
x = self.relu(self.conv_layer2(x)) # Apply ReLU on the second convolutional layer
x = x.view(x.size(0), -1) # Flatten the output from ReLU operation
x = self.relu(self.layer_1(x)) # Apply ReLU on the output from the fully connected layer
x = self.layer_2(x) # Layer without ReLU
return x

Now that we know the implementation and applications of the ReLU activation function, let’s discuss the advantages and disadvantages of ReLU.

Advantages of ReLU

Due to its simple definition and implementation, the ReLU activation function has several advantages.

Computational efficiency

ReLU returns the input itself for positive inputs and zero otherwise. It doesn’t require complex logarithmic or exponential calculations like other activation functions, which makes it computationally efficient.

Avoids vanishing gradient problem

When using sigmoid and hyperbolic tangent (tanh) activation functions, the deep learning models often run into the vanishing gradient problem.

The ReLU function’s gradient is 1 for all positive values and zero otherwise. Due to this, a neural network model with ReLU activation allows the gradient of the loss function to travel backward without any change, avoiding the vanishing gradient problem.

Allows deep neural networks

ReLU doesn’t reduce the gradient while backpropagation, so we don’t have to worry about the vanishing gradient problem. Thus, we can train deep learning models with many layers and solve complex problems.

Sparse activation

The ReLU activation function shuts down a neuron if the input is negative or zero. Due to fewer active neurons, the computation reduces during feedforward and backpropagation, leading to faster training and inference. Sparse activation also acts like implicit regularization and helps avoid overfitting.

Captures non-linearity in input data

Although ReLU is a piecewise linear function, it successfully captures the non-linear trends in the data when used as an activation function in deep learning models.

Disadvantages of ReLU

ReLU is popular and effective for building deep neural networks. However, it also comes with some disadvantages:

Dying ReLU problem

During backpropagation, the ReLU activation function allows the gradient to travel backward without any change, which helps the neural network avoid the vanishing gradient problem.

However, a large gradient, poor weights initialization, and high learning rate might update the weights of a neuron in such a manner that the neuron starts giving zero as output for all the input values. In this situation, the neuron dies. As the gradient of the ReLU function at zero is also zero, subsequent backpropagations will not be able to alter the weights. Thus, there is a high chance of the neuron remaining dead forever.

Unbounded output

The ReLU function has unbounded output for positive values. Also, the output of a neuron becomes input for the neurons in the next layer. If the weights in the neurons are greater than 1, the values will increase after passing through each layer during feedforward. During backpropagation, we compute gradients layer by layer using the chain rule.

Hence, if the gradients or outputs of the neurons at each step are very large, their products explode exponentially, causing the exploding gradient problem. This leads to unstable training and bad model performance.

ReLU outputs aren’t zero-centered

ReLU always provides a non-negative output. Due to this, the gradient updates for weights tend to go in the same direction (all positive or all negative). This leads to inefficient weight updates and slows down model convergence while training.

Proper weight initialization, batch normalization, and gradient clipping can help us avoid these disadvantages. We can also slightly change the ReLU function to overcome these disadvantages. Let’s discuss some of the variations of the ReLU activation function.

Variations of the ReLU function

To avoid the dying ReLU problem, we can use different variations of the ReLU activation function. These variations include leaky ReLU, parametric ReLU, and exponential linear unit. Let’s discuss these functions one by one.

Leaky ReLU function

For non-negative inputs, leaky ReLU behaves the same way as ReLU. For negative inputs, it uses a small multiplying factor α and returns the product of the input and α as output. We can define the leaky ReLU function mathematically as follows:

f(x)={x,if x0αx,otherwisef(x) = \begin{cases} x, & \text{if } x \geq 0 \\ αx, & \text{otherwise} \end{cases}

If we plot the leaky ReLU function with α=0.01, we get the following chart:

Leaky ReLU chart

Here, you can see that the output of the Leaky ReLU function isn’t exactly zero for negative inputs, and the gradient also doesn’t become zero. Due to this, Leaky ReLU avoids the dying ReLU problem.

In Python, you can implement the leaky ReLU function as shown below:

import numpy as np
# Define leaky ReLU function
def leaky_relu_function(x, alpha=0.01):
if x<=0:
return alpha*x
return x
# Vectorize the function
leaky_relu = np.vectorize(leaky_relu_function)
array_1d = np.array([0,4.1,2.2,5,-1,6.5,-2.8,5,-7.3,8])
array_2d = np.array([[0,4.1,2.2],[5,-1,6.5],[-2.8,5,-7.3]])
leaky_relu_output_1d = leaky_relu(array_1d)
leaky_relu_output_2d = leaky_relu(array_2d)
print("The input 1D array is:",array_1d)
print("The output 1D array is:",leaky_relu_output_1d)
print("The input 2D array is:\n",array_2d)
print("The output 2D array is:\n",leaky_relu_output_2d)

Output:

The input 1D array is: [ 0. 4.1 2.2 5. -1. 6.5 -2.8 5. -7.3 8. ]
The output 1D array is: [ 0. 4.1 2.2 5. -0.01 6.5 -0.028 5. -0.073 8. ]
The input 2D array is:
[[ 0. 4.1 2.2]
[ 5. -1. 6.5]
[-2.8 5. -7.3]]
The output 2D array is:
[[ 0. 4.1 2.2 ]
[ 5. -0.01 6.5 ]
[-0.028 5. -0.073]]

In the output, you can see that leaky ReLU returns positive values without any change. For negative values in the input, it returns αx, where α=0.01 is the multiplication factor and x is the input.

Parametric ReLU (PReLU)

Parametric ReLU works the same as the leaky ReLU function. However, the parameter α in parametric ReLU is a learnable parameter whose value is determined during model training. PReLU offers more accuracy and adaptability than leaky ReLU. However, it comes with additional computational costs as the parameter α needs to be determined during training.

Exponential linear unit (ELU)

Instead of multiplying the factor α, we can introduce an exponential operation for negative values in the ReLU function, which gives us the exponential linear unit (ELU) function. Mathematically, we can define ELU as follows:

f(x)={x,if x0α(ex1),otherwisef(x) = \begin{cases} x, & \text{if } x \geq 0 \\ α(e^x - 1), & \text{otherwise} \end{cases}

In this definition, α is set to 1 by default. If we plot the above function, we get the following plot:

Exponential linear unit plot

Here, you can see that the outputs for negative values are changing non-linearly. Also, the term (e^x - 1) always evaluates to a value between -1 to 0 for negative x. Hence, the output is also bounded in the negative region. You can implement ELU in Python as shown in the following example:

import numpy as np
# Define exponential ReLU function
def elu_function(x, alpha=1):
if x<=0:
return alpha * (np.exp(x) - 1)
return x
# Vectorize the function
elu = np.vectorize(elu_function)
array_1d = np.array([0,4.1,2.2,5,-1,6.5,-2.8,5,-7.3,8])
array_2d = np.array([[0,4.1,2.2],[5,-1,6.5],[-2.8,5,-7.3]])
elu_output_1d = elu(array_1d)
elu_output_2d = elu(array_2d)
print("The input 1D array is:",array_1d)
print("The output 1D array is:",elu_output_1d)
print("The input 2D array is:\n",array_2d)
print("The output 2D array is:\n",elu_output_2d)

Output:

The input 1D array is: [ 0. 4.1 2.2 5. -1. 6.5 -2.8 5. -7.3 8. ]
The output 1D array is: [ 0. 4.1 2.2 5. -0.63212056 6.5
-0.93918994 5. -0.99932446 8. ]
The input 2D array is:
[[ 0. 4.1 2.2]
[ 5. -1. 6.5]
[-2.8 5. -7.3]]
The output 2D array is:
[[ 0. 4.1 2.2 ]
[ 5. -0.63212056 6.5 ]
[-0.93918994 5. -0.99932446]]

Conclusion

ReLU and its variations are among the most popular and efficient activation functions for building deep learning models. In this article, we discussed the definition, implementation, and applications of the ReLU activation function. We also discussed the definition and implementation of ReLU variations like leaky ReLU and ELU to avoid the dying ReLU problem.

To learn more about how to use activation functions to build neural networks, you can go through the Deep Learning with Tensorflow skill path. If you prefer PyTorch over TensorFlow, you might like this intro to PyTorch and Neural Networks course.

FAQs

What does rectified mean in ReLU?

Rectified in ReLU comes from signal processing, where rectifiers convert negative voltages to zero or positive voltages. This means turning all the negative values into zero or modifying them to remove or reduce their negativity, which is what ReLU does.

Why is ReLU used in deep learning?

The ReLU activation function is used in deep learning because it avoids the vanishing gradient problem and has a simple and efficient implementation. It also uses sparse activation, which results in reduced computation during training, introduces implicit regularization, and helps avoid overfitting.

Why use ReLU over linear?

A linear activation function cannot capture the non-linear patterns in the data. ReLU, despite being piecewise linear, successfully captures these patterns and helps build accurate deep-learning models.

Is ReLU used in CNN?

Yes, ReLU is used in convolutional neural networks (CNNs) due to its efficient computation and its use of sparse activation to introduce implicit regularization and avoid overfitting.

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team