How to get one hot encoding in python?
I've got a dataset with couple of features and want to convert it to one hot vector.
What is the best way for one hot encoding in python?
I'm using pandas or a lib to load the data from db.
I've got a dataset with couple of features and want to convert it to one hot vector.
What is the best way for one hot encoding in python?
I'm using pandas or a lib to load the data from db.
First lets see why we need one hot encoding and how it's done.
If you've got a data, which features are not numbers, you will have to convert it to numbers and create mapping between the vector indices and feature values.
You will need something like this
If you are using a pandas, let's assume you've got simple data in your script.
import pandas as pd
df = pd.DataFrame([
['ferrari', 'green', 2017],
['maserati', 'blue', 2015],
['bmw', 'black', 2018],
])
df.columns = ['model', 'color', 'year']
As you see the first 2 features are string values and we have to convert it somehow to numbers.
You can convert it to one hot vector by using sklearn.
from sklearn.preprocessing import LabelEncoder
le_color = LabelEncoder()
le_model = LabelEncoder()
df['one_hot_color'] = le_color.fit_transform(df.color)
df['one_hot_model'] = le_model.fit_transform(df.model)
Or you can have the array type of encoded values. Something like this
from sklearn.preprocessing import OneHotEncoder
# Create one hot encoder
encoder = OneHotEncoder()
# Convert to array of encoded values
data = encoder.fit_transform(data).toarray()
And just it. Now you've got your one hot vectors for each feature.
Great answer. Thank you