One Hot Encoding
One hot encoding is a method of encoding categorical variables as binary vectors that can be more readily used by machine learning algorithms. Algorithms work with numbers such as
1, not categorical values like
"bicycle". However, the data these algorithms are asked to analyze often have values that are encoded categorically. One hot encoding is a process by which those categorical values in the data can be translated into numbers that can be interpreted by those algorithms.
Encoding Categorical Values
The process of one hot encoding is as follows:
- Each possible value in the data being encoded is assigned a unique sequential integer value.
- Each of those values is represented by a binary vector with a position for each integer value.
- Each vector has a value of
1in the position for its corresponding integer value, and a
- The categorical values in the data are replaced by the corresponding vector.
Depending on the implementation, the encoded values may be represented by actual vector data types, or they may be expanded as additional columns in the data.
Assigning integer values to category values:
Assigning vectors to values:
Applying the vector values as columns to the original data: