Pandas Python Scikit Learn

Convert Categorical Features to Integers with Scikit Learn

Machine Learning requires all the categorical features to be numbers. Often we need to convert the categorical text to integers. We can readily do this conversion using LabelEncoder in Scikit Learn Python package.

Take the dataframe as follows:

import pandas as pd

df = pd.DataFrame({"Name":['Alfred','Steve','Ally','Jane','Tony'],
                   "Gender":['Male','Male','Female','Female','Male'],
                   "Race":['Chinese','Malay','Chinese','Chinese','Malay'],
                   "Height": [170,172,153,161,180]})
df
Name	Gender	Race	Height
0	Alfred	Male	Chinese	170
1	Steve	Male	Malay	172
2	Ally	Female	Chinese	153
3	Jane	Female	Chinese	161
4	Tony	Male	Malay	180

We can apply LabelEncoder to convert the Gender and Race columns to integers as follows

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

categorical_cols = ['Gender','Race']

df[categorical_cols] = df[categorical_cols].apply(lambda col: le.fit_transform(col))
df
Name	Gender	Race	Height
0	Alfred	1	0	170
1	Steve	1	1	172
2	Ally	0	0	153
3	Jane	0	0	161
4	Tony	1	1	180

References:

Relevant Courses

July 15, 2021