Rectified Linear Unit (ReLU) Function in Deep Learning
We use activation functions in deep learning to capture non-linearity in the training data. Depending on the use case, we use different activation functions, such as sigmoid, softmax, tanh, and rectified linear unit (ReLU). Among these functions, ReLU is the most popular activation function for building deep neural networks due to its simplicity, faster training, and ability to avoid the vanishing gradient problem.
In this article, we will discuss the definition, Python implementation, and applications of the ReLU activation function. We will also discuss different variations of the ReLU function and their advantages and disadvantages.
What is the rectified linear unit (ReLU) function?
ReLU is a piecewise linear function that converts its inputs to non-negative outputs. It returns the input without any change for non-negative input values. For negative values, ReLU returns 0
. Mathematically, we can define the ReLU function as shown below:
As the ReLU function returns the maximum value between the input and zero, you can also write the function definition as follows:
When we pass a positive value, say 7.2, to the ReLU function, it returns 7.2. On the other hand, if we pass a negative value, say -5.5, to ReLU, we get zero as the output. If we plot the rectified linear unit function for values ranging from -10 to 10, we get the following graph:
As you can see, the output from the ReLU function is always zero for negative inputs. Otherwise, it remains the same as the input.
Now that we know the mathematical definition, let’s implement the ReLU function in Python.
Learn Intermediate Java: Input and Output
This course shows how programmers can code a Java program that considers, interprets, and responds to input and output.Try it for freeImplementing the ReLU activation function in Python
To implement ReLU in Python, we need to write a function that takes a list or a numpy array as its input. After execution, the function should return a modified list or numpy array where all the negative values are replaced by zero. For this, we will first write a function that takes a single value as its input and returns the output according to the mathematical definition of the ReLU function.
def relu_function(x):if x<=0:return 0.return xprint("ReLU output for input 7.2 is:",relu_function(7.2))print("ReLU output for input -5.5 is:",relu_function(-5.5))
Output:
ReLU output for input 7.2 is: 7.2ReLU output for input -5.5 is: 0.0
As we need the ReLU function to take a list or numpy array as its input and process its elements, we will vectorize the above function using the vectorize()
function defined in the numpy module. Vectorization allows us to apply the ReLU operation to all the elements in the input list or array without iterating over the input.
import numpy as nprelu = np.vectorize(relu_function)
As we now have the vectorized relu
function, we can apply the ReLU operation on 1D and 2D Python lists:
import numpy as npdef relu_function(x):if x<=0:return 0.return xrelu = np.vectorize(relu_function)list_1d = [0,4.1,2.2,5,-1,6.5,-2.8,5,-7.3,8]list_2d = [[0,4.1,2.2],[5,-1,6.5],[-2.8,5,-7.3]]relu_output_1d=relu(list_1d)relu_output_2d=relu(list_2d)print("The input 1D list is:",list_1d)print("The output 1D list is:",relu_output_1d)print("The input 2D list is:\n",list_2d)print("The output 2D list is:\n",relu_output_2d)
Output:
The input 1D list is: [0, 4.1, 2.2, 5, -1, 6.5, -2.8, 5, -7.3, 8]The output 1D list is: [0. 4.1 2.2 5. 0. 6.5 0. 5. 0. 8. ]The input 2D list is:[[0, 4.1, 2.2], [5, -1, 6.5], [-2.8, 5, -7.3]]The output 2D list is:[[0. 4.1 2.2][5. 0. 6.5][0. 5. 0. ]]
In the outputs, you can observe that the negative values in the lists are replaced by zero, whereas the positive values aren’t modified.
We can also apply the relu()
function on 1D and 2D numpy arrays, as shown below:
array_1d = np.array([0,4.1,2.2,5,-1,6.5,-2.8,5,-7.3,8])array_2d = np.array([[0,4.1,2.2],[5,-1,6.5],[-2.8,5,-7.3]])relu_output_1d = relu(array_1d)relu_output_2d = relu(array_2d)print("The input 1D array is:",array_1d)print("The output 1D array is:",relu_output_1d)print("The input 2D array is:\n",array_2d)print("The output 2D array is:\n",relu_output_2d)
Output:
The input 1D array is: [ 0. 4.1 2.2 5. -1. 6.5 -2.8 5. -7.3 8. ]The output 1D array is: [0. 4.1 2.2 5. 0. 6.5 0. 5. 0. 8. ]The input 2D array is:[[ 0. 4.1 2.2][ 5. -1. 6.5][-2.8 5. -7.3]]The output 2D array is:[[0. 4.1 2.2][5. 0. 6.5][0. 5. 0. ]]
ReLU function in PyTorch and TensorFlow
Deep learning frameworks like PyTorch and TensorFlow also provide built-in implementations for the ReLU function. You can apply the ReLU function to PyTorch tensors, as shown in the following example:
from torch import nn, from_numpyrelu=nn.ReLU()# Create PyTorch tensors from numpy arrayspytorch_tensor_1d=from_numpy(array_1d)pytorch_tensor_2d=from_numpy(array_2d)# Apply ReLU function on the tensorsrelu_output_1d = relu(pytorch_tensor_1d)relu_output_2d = relu(pytorch_tensor_2d)print("The input 1D tensor is:",pytorch_tensor_1d)print("The output 1D tensor is:",relu_output_1d)print("The input 2D tensor is:\n",pytorch_tensor_2d)print("The output 2D tensor is:\n",relu_output_2d)
Output:
The input 1D tensor is: tensor([ 0.0000, 4.1000, 2.2000, 5.0000, -1.0000, 6.5000, -2.8000, 5.0000,-7.3000, 8.0000], dtype=torch.float64)The output 1D tensor is: tensor([0.0000, 4.1000, 2.2000, 5.0000, 0.0000, 6.5000, 0.0000, 5.0000, 0.0000,8.0000], dtype=torch.float64)The input 2D tensor is:tensor([[ 0.0000, 4.1000, 2.2000],[ 5.0000, -1.0000, 6.5000],[-2.8000, 5.0000, -7.3000]], dtype=torch.float64)The output 2D tensor is:tensor([[0.0000, 4.1000, 2.2000],[5.0000, 0.0000, 6.5000],[0.0000, 5.0000, 0.0000]], dtype=torch.float64)
Similarly, you can use the relu()
function from the tensorflow.nn
module with TensorFlow tensors.
from tensorflow.nn import relufrom tensorflow import convert_to_tensor#Create TensorFlow tensors from numpy arraystf_tensor_1d=convert_to_tensor(array_1d)tf_tensor_2d=convert_to_tensor(array_2d)# Apply ReLU function on the tensorsrelu_output_1d = relu(tf_tensor_1d)relu_output_2d = relu(tf_tensor_2d)print("The input 1D tensor is:",tf_tensor_1d)print("The output 1D tensor is:",relu_output_1d)print("The input 2D tensor is:\n",tf_tensor_2d)print("The output 2D tensor is:\n",relu_output_2d)
Output:
The input 1D tensor is: tf.Tensor([ 0. 4.1 2.2 5. -1. 6.5 -2.8 5. -7.3 8. ], shape=(10,), dtype=float64)The output 1D tensor is: tf.Tensor([0. 4.1 2.2 5. 0. 6.5 0. 5. 0. 8. ], shape=(10,), dtype=float64)The input 2D tensor is:tf.Tensor([[ 0. 4.1 2.2][ 5. -1. 6.5][-2.8 5. -7.3]], shape=(3, 3), dtype=float64)The output 2D tensor is:tf.Tensor([[0. 4.1 2.2][5. 0. 6.5][0. 5. 0. ]], shape=(3, 3), dtype=float64)
Applications of the ReLU function in deep learning
In deep learning applications, we use the ReLU activation function to define neural network layers. Let’s discuss using ReLU in TensorFlow and PyTorch to define neural network layers.
Create neural network layers in TensorFlow keras using the ReLU activation function
In TensorFlow 2.x, we can define neural network layers with the ReLU activation function by setting the activation
parameter to relu
in the functions like Dense()
, Conv2D()
, SimpleRNN()
, and LSTM()
.
from tensorflow.keras import layers# Dense layer with relu activationdense_layer = layers.Dense(64, activation='relu')# Convolutional layer using relu activationcnn_layer = layers.Conv2D(32, (2, 2) , activation = 'relu', input_shape = (28, 28, 1))# RNN using relu activationrnn_layer = layers.SimpleRNN(64, activation='relu', input_shape=(10, 50))# LSTM using relu activationlstm_layer = layers.LSTM(64, activation='relu', input_shape=(10, 50))
Create neural network layers in PyTorch using the ReLU activation function
In PyTorch, we use the forward()
method to define the flow of inputs in a neural network model. To apply the ReLU activation function on a neural network layer, you can pass the layer to the torch.nn.functional.relu
function in the forward()
method.
from torch import nnfrom torch.nn.functional import reluclass ReluModel(nn.Module):def __init__(self):super().__init__()self.layer_1 = nn.Linear(100, 50)def forward(self, x):# Apply relu on the output of layer_1x = relu(self. layer_1(x))return x
Instead of the torch.nn.functional.relu
function, we can also use the nn.ReLU
class to define the ReLU activation function and use it in the forward()
method, as shown below:
from torch import nnclass ReluModel(nn.Module):def __init__(self):super().__init__()# Define a fully connected layerself.layer_1 = nn.Linear(100, 50)# Define ReLU layerself.relu = nn.ReLU()def forward(self, x):# Pass output of layer_1 to relux = self.relu(self.layer_1(x))return x
When there are multiple layers in the model, you can pass the output of a layer to the relu()
function to apply ReLU activation to the respective layer in the model.
from torch import nnclass ReluModel(nn.Module):def __init__(self):super().__init__()# Define convolutional layersself.conv_layer1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1)self.conv_layer2 = nn.Conv2d(32, 64, 3, padding=1)# Define fully connected layersself.layer_1 = nn.Linear(64 * 7 * 7, 128)self.layer_2 = nn.Linear(128, 10)# Define ReLU layerself.relu = nn.ReLU()def forward(self, x):x = self.relu(self.conv_layer1(x)) # Apply ReLU on the first convolutional layerx = self.relu(self.conv_layer2(x)) # Apply ReLU on the second convolutional layerx = x.view(x.size(0), -1) # Flatten the output from ReLU operationx = self.relu(self.layer_1(x)) # Apply ReLU on the output from the fully connected layerx = self.layer_2(x) # Layer without ReLUreturn x
Now that we know the implementation and applications of the ReLU activation function, let’s discuss the advantages and disadvantages of ReLU.
Advantages of ReLU
Due to its simple definition and implementation, the ReLU activation function has several advantages.
Computational efficiency
ReLU returns the input itself for positive inputs and zero otherwise. It doesn’t require complex logarithmic or exponential calculations like other activation functions, which makes it computationally efficient.
Avoids vanishing gradient problem
When using sigmoid and hyperbolic tangent (tanh) activation functions, the deep learning models often run into the vanishing gradient problem.
The ReLU function’s gradient is 1 for all positive values and zero otherwise. Due to this, a neural network model with ReLU activation allows the gradient of the loss function to travel backward without any change, avoiding the vanishing gradient problem.
Allows deep neural networks
ReLU doesn’t reduce the gradient while backpropagation, so we don’t have to worry about the vanishing gradient problem. Thus, we can train deep learning models with many layers and solve complex problems.
Sparse activation
The ReLU activation function shuts down a neuron if the input is negative or zero. Due to fewer active neurons, the computation reduces during feedforward and backpropagation, leading to faster training and inference. Sparse activation also acts like implicit regularization and helps avoid overfitting.
Captures non-linearity in input data
Although ReLU is a piecewise linear function, it successfully captures the non-linear trends in the data when used as an activation function in deep learning models.
Disadvantages of ReLU
ReLU is popular and effective for building deep neural networks. However, it also comes with some disadvantages:
Dying ReLU problem
During backpropagation, the ReLU activation function allows the gradient to travel backward without any change, which helps the neural network avoid the vanishing gradient problem.
However, a large gradient, poor weights initialization, and high learning rate might update the weights of a neuron in such a manner that the neuron starts giving zero as output for all the input values. In this situation, the neuron dies. As the gradient of the ReLU function at zero is also zero, subsequent backpropagations will not be able to alter the weights. Thus, there is a high chance of the neuron remaining dead forever.
Unbounded output
The ReLU function has unbounded output for positive values. Also, the output of a neuron becomes input for the neurons in the next layer. If the weights in the neurons are greater than 1, the values will increase after passing through each layer during feedforward. During backpropagation, we compute gradients layer by layer using the chain rule.
Hence, if the gradients or outputs of the neurons at each step are very large, their products explode exponentially, causing the exploding gradient problem. This leads to unstable training and bad model performance.
ReLU outputs aren’t zero-centered
ReLU always provides a non-negative output. Due to this, the gradient updates for weights tend to go in the same direction (all positive or all negative). This leads to inefficient weight updates and slows down model convergence while training.
Proper weight initialization, batch normalization, and gradient clipping can help us avoid these disadvantages. We can also slightly change the ReLU function to overcome these disadvantages. Let’s discuss some of the variations of the ReLU activation function.
Variations of the ReLU function
To avoid the dying ReLU problem, we can use different variations of the ReLU activation function. These variations include leaky ReLU, parametric ReLU, and exponential linear unit. Let’s discuss these functions one by one.
Leaky ReLU function
For non-negative inputs, leaky ReLU behaves the same way as ReLU. For negative inputs, it uses a small multiplying factor α
and returns the product of the input and α
as output. We can define the leaky ReLU function mathematically as follows:
If we plot the leaky ReLU function with α=0.01
, we get the following chart:
Here, you can see that the output of the Leaky ReLU function isn’t exactly zero for negative inputs, and the gradient also doesn’t become zero. Due to this, Leaky ReLU avoids the dying ReLU problem.
In Python, you can implement the leaky ReLU function as shown below:
import numpy as np# Define leaky ReLU functiondef leaky_relu_function(x, alpha=0.01):if x<=0:return alpha*xreturn x# Vectorize the functionleaky_relu = np.vectorize(leaky_relu_function)array_1d = np.array([0,4.1,2.2,5,-1,6.5,-2.8,5,-7.3,8])array_2d = np.array([[0,4.1,2.2],[5,-1,6.5],[-2.8,5,-7.3]])leaky_relu_output_1d = leaky_relu(array_1d)leaky_relu_output_2d = leaky_relu(array_2d)print("The input 1D array is:",array_1d)print("The output 1D array is:",leaky_relu_output_1d)print("The input 2D array is:\n",array_2d)print("The output 2D array is:\n",leaky_relu_output_2d)
Output:
The input 1D array is: [ 0. 4.1 2.2 5. -1. 6.5 -2.8 5. -7.3 8. ]The output 1D array is: [ 0. 4.1 2.2 5. -0.01 6.5 -0.028 5. -0.073 8. ]The input 2D array is:[[ 0. 4.1 2.2][ 5. -1. 6.5][-2.8 5. -7.3]]The output 2D array is:[[ 0. 4.1 2.2 ][ 5. -0.01 6.5 ][-0.028 5. -0.073]]
In the output, you can see that leaky ReLU returns positive values without any change. For negative values in the input, it returns αx
, where α=0.01
is the multiplication factor and x
is the input.
Parametric ReLU (PReLU)
Parametric ReLU works the same as the leaky ReLU function. However, the parameter α
in parametric ReLU is a learnable parameter whose value is determined during model training. PReLU offers more accuracy and adaptability than leaky ReLU. However, it comes with additional computational costs as the parameter α
needs to be determined during training.
Exponential linear unit (ELU)
Instead of multiplying the factor α
, we can introduce an exponential operation for negative values in the ReLU function, which gives us the exponential linear unit (ELU) function. Mathematically, we can define ELU as follows:
In this definition, α
is set to 1 by default. If we plot the above function, we get the following plot:
Here, you can see that the outputs for negative values are changing non-linearly. Also, the term (e^x - 1)
always evaluates to a value between -1 to 0 for negative x. Hence, the output is also bounded in the negative region. You can implement ELU in Python as shown in the following example:
import numpy as np# Define exponential ReLU functiondef elu_function(x, alpha=1):if x<=0:return alpha * (np.exp(x) - 1)return x# Vectorize the functionelu = np.vectorize(elu_function)array_1d = np.array([0,4.1,2.2,5,-1,6.5,-2.8,5,-7.3,8])array_2d = np.array([[0,4.1,2.2],[5,-1,6.5],[-2.8,5,-7.3]])elu_output_1d = elu(array_1d)elu_output_2d = elu(array_2d)print("The input 1D array is:",array_1d)print("The output 1D array is:",elu_output_1d)print("The input 2D array is:\n",array_2d)print("The output 2D array is:\n",elu_output_2d)
Output:
The input 1D array is: [ 0. 4.1 2.2 5. -1. 6.5 -2.8 5. -7.3 8. ]The output 1D array is: [ 0. 4.1 2.2 5. -0.63212056 6.5-0.93918994 5. -0.99932446 8. ]The input 2D array is:[[ 0. 4.1 2.2][ 5. -1. 6.5][-2.8 5. -7.3]]The output 2D array is:[[ 0. 4.1 2.2 ][ 5. -0.63212056 6.5 ][-0.93918994 5. -0.99932446]]
Conclusion
ReLU and its variations are among the most popular and efficient activation functions for building deep learning models. In this article, we discussed the definition, implementation, and applications of the ReLU activation function. We also discussed the definition and implementation of ReLU variations like leaky ReLU and ELU to avoid the dying ReLU problem.
To learn more about how to use activation functions to build neural networks, you can go through the Deep Learning with Tensorflow skill path. If you prefer PyTorch over TensorFlow, you might like this intro to PyTorch and Neural Networks course.
FAQs
What does rectified mean in ReLU?
Rectified in ReLU comes from signal processing, where rectifiers convert negative voltages to zero or positive voltages. This means turning all the negative values into zero or modifying them to remove or reduce their negativity, which is what ReLU does.
Why is ReLU used in deep learning?
The ReLU activation function is used in deep learning because it avoids the vanishing gradient problem and has a simple and efficient implementation. It also uses sparse activation, which results in reduced computation during training, introduces implicit regularization, and helps avoid overfitting.
Why use ReLU over linear?
A linear activation function cannot capture the non-linear patterns in the data. ReLU, despite being piecewise linear, successfully captures these patterns and helps build accurate deep-learning models.
Is ReLU used in CNN?
Yes, ReLU is used in convolutional neural networks (CNNs) due to its efficient computation and its use of sparse activation to introduce implicit regularization and avoid overfitting.
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
Understanding Neural Networks and Their Components
Discover the neural network architecture and its core components to understand how they work. - Article
Understanding Convolutional Neural Network (CNN) Architecture
Learn how a convolutional neural network (CNN) works by understanding its components and architecture using examples. - Article
Building a Neural Network Model Using TensorFlow
Learn how to build a neural network model in TensorFlow by creating a digits classification model using the MNIST dataset.
Learn more on Codecademy
- Free course
Learn Intermediate Java: Input and Output
This course shows how programmers can code a Java program that considers, interprets, and responds to input and output.Intermediate1 hour - Free course
Learn the Command Line: Redirecting Input and Output
Learn to redirect input and output to and from files and programs.Beginner Friendly1 hour - Course
Deep Learning with TensorFlow: Classification
Build deep learning models to classify data.Intermediate2 hours
- What is the rectified linear unit (ReLU) function?
- Implementing the ReLU activation function in Python
- Applications of the ReLU function in deep learning
- Create neural network layers in TensorFlow keras using the ReLU activation function
- Create neural network layers in PyTorch using the ReLU activation function
- Advantages of ReLU
- Disadvantages of ReLU
- Variations of the ReLU function
- Conclusion
- FAQs