Byte-Sized ML/AI Series: Handling Imbalanced Datasets
- Created By ebrahim khaja
- Posted on July 18th, 2024
- Overview
- Prerequisites
- Audience
- Audience
- Curriculum
Description:
Quite often, when performing classification, we must deal with data (target/label) variable where the classes defined are lobsided e.g. out of 100 credit card transactions we might have 99 transactions that are not fraudulent, and only 1 transaction is fraudulent.
Duration: 90 minutes
Course Code: BDT363
- Must be familiar with pandas library, understanding how to use Jupyter Notebook/Lab
- Nice to understand how to build classification models using sci-kit learn library
- This session is designed for anyone who is familiar with performing Classification using sci-kit learn libraries
- This session is designed for anyone who is familiar with performing Classification using sci-kit learn libraries
Course Outline:
Quite often, when performing classification, we must deal with data (target/label) variable where the classes defined are lobsided e.g. out of 100 credit card transactions we might have 99 transactions that are not fraudulent, and only 1 transaction is fraudulent. Models built using such data might give great accuracy – but is that model great? How can we handle such an imbalanced dataset? In this session, we are going to learn different techniques to generate synthetic data to overcome such imbalances.
We will perform hands-on labs to understand the following:
- Generate synthetic data for imbalanced datasets
- Visualize such synthetic data
- Apply machine learning algorithms to build classification models using such synthetic data
Training Material provided: Yes (Digital format)
Hands-on Lab: Labs will be performed using Google Colaboratory (or on personal machine)