- Overview
- Prerequisites
- Audience
- Audience
- Curriculum
Overview:
When doing data science and data analysis, in order to achieve your purpose, it’s important to have cleaned and well-prepared data to learn from. Indeed, most of the effort required to extract insight from data lies in cleaning your data. This course provides a comprehensive guide to effectively using Python data cleaning tools and techniques. We’ll discuss the practical application of tools and techniques needed for data ingestion, imputing missing values, detecting unreliable data and statistical anomalies, in addition to feature engineering. This course will detail the essential steps performed in data analysis and data science pipelines and provide you with a firm understanding of the data cleaning process necessary to perform real-world data analysis, data science and machine learning tasks.
Course Code/Duration:
BDT119 / Half Day (3 hours)
Learning Objectives:
After this course, you will be able to:
- Think carefully about your data and what your want really want to know
- Ask the right questions to gain the desired insights from your data
- Detect problems from the shape of your data
- Appropriately clean your data so that you are saying what you mean
- Reasonably and reliably impute missing values
- Prepare data for analytic and machine learning tasks
- Transform your data into numerical values that machines prefer
- Create better features (independent variables) so that the machine can better understand the problem that you want it to help you to solve
- Basic Programming
- Anyone interested in programming and collaborating with teams on Projects
- Anyone interested in programming and collaborating with teams on Projects
Course Outline:
Overview of Data and Data Types
- Numerical Values
- Categorical Values
- Ordinal Values
Asking Clear and Precise Questions
Feature Selection
Normal Distribution
- Skew
- Outliers
Data Cleaning
- Imputing Missing Values
- Dropping Rows
Data Transformation
- One-Hot Encoding
- Ordinal Transformation
- Discretization
Feature Engineering
Introduction to the Python Programming Language
- Installing Anaconda
- Python Essentials
- Introduction to Pandas
Milestone 1: Learn how to use Jupyter Notebooks
Applied Data Cleaning and Preparation
- Using Pandas to clean data
- Using Pandas to prepare and transform data for analysis
Milestone 2: Perform data cleaning and preparation for data analysis
Structured Activity/Exercises/Case Studies:
- Milestone 1: Learn how to use Jupyter Notebooks
- Milestone 2: Perform data cleaning and preparation for data analysis
Training material provided:
Yes (Digital format)