SMOTE

Emilia Orellana
2 min readDec 10, 2020

--

A technique to overcome class imbalance can be SMOTE. SMOTE is my favorite technique because, in my opinion, it is the easiest. As I explained in the first blog about the class imbalance that can be found here. SMOTE is used when there are very few data points in a class. As I have mentioned before it is important to fix class imbalance in the dataset because when we are creating the models there has to be an equal chance for the model to predict a classification.

Synthetic Minority Oversampling Technique (SMOTE) can be understood by the visual above. We can see there are a few green circles and many more blue squares. After generating samples it basically uses the features from the provided datasets and that's how generating samples are made. Finally, we see from the last image there are more green squares and it seems more even out.

The code for SMOTE is also very simple. The attached visual is the code from an old project, where I had to SMOTE my data.

From line 7, that is the mainline of code needed. It is important to define what X and y would be, and then we fit it into SMOTE().fit(X,y)

After using those lines of code, we can test the class imbalance by the same code .count_values()

My Github can be found here. I use SMOTE in the majority of my projects. Please feel free to let me know if you have any questions!

--

--

No responses yet