.get_dummies()

dkr2109's avatar
Published Apr 19, 2023
Contribute to Docs

The .get_dummies() function creates dummy (indicator) variables from categorical variables. Each level of the categorical variable gets its own column - a dummy variable. The dummy variables take on the value “1” if the original categorical variable was that level or “0” if not.

Syntax

pd.get_dummies(data)

The data parameter is the data to be converted to dummy variables. It is the only mandatory parameter. It can be any array-like data structure including a pandas Series or DataFrame.

The other parameters are optional or have default arguments. They are listed below.

Parameter Name Data Type Usage
prefix str, list of str, or dict of str, default None String to append to the beginning of DataFrame column names. If a list is passed, its length should be equal to the number of columns. A dictionary can also be passed, it should map column names to prefixes.
prefix_sep str, default ‘_‘ If the prefix parameter is not None, this is a string appended after the prefix separating it from the level of the categorical variable.
dummy_na bool, default False Adds a column to indicate NaNs. If False, NaNs are ignored.
columns list-like, default None If a DataFrame is passed to the data parameter, a list of columns can be passed to the columns parameter to be encoded as dummy variables. If columns is None then all the columns with object, string, or category dtype will be converted.
sparse bool, default False If True, the dummy-encoded columns are backed by a SparseArray. If False, the dummy-encoded columns are backed by a NumPy array.
drop_first bool, default False If True, this drops the first level of each variable to undergo dummy encoding.
dtype dtype, default bool Specifies the dtype for dummy variable columns that are created. Note: Only a single dtype is permitted.

Codebyte Example

The code below creates a list based on the letters of the alphabet abcs, converts abcs to a Series, and prints the Series. Then the .get_dummys() function is applied to the Series to create a DataFrame whose columns are indicator variables for each level of the single categorical variable in the Series. Finally, the DataFrame is printed.

Code
Output
Loading...

All contributors

Contribute to Docs

Learn Python:Pandas on Codecademy