Data Wrangling, Modeling, And Model Maintenance In Python
- Created By raju2006
- Last Updated December 29th, 2023
- Overview
- Prerequisites
- Audience
- Audience
- Curriculum
Description:
This Data Science Quick Start course will expose you to real-world applications of data science. We will discuss the data science process and the various tools used to analyze data sets, and used to apply machine learning to defined data problems. We will begin by installing Anaconda, which provides all of the tools that we’ll use to explore and apply data science to real-world use cases. We will perform data exploration, analysis, modeling and visualization to learn the data science process, and to understand the importance of data science in countless industries. This Data Science Quick Start Course will give you a understanding and importance of data science in business, industry and technology.
Course Code/Duration:
BDT112 / 3 Days
Learning Objectives:
After this course, you will be able to:
- Install Anaconda on personal computer.
- Understand the Data Science Field.
- Become familiar with Descriptive and Inferential Statistics and statistical analysis.
- Learn the primary toolkit for data science in Python including NumPy, Pandas, Matplotlib and Scikit-learn.
- Learn how to perform exploratory data analysis.
- Learn the effectiveness of data cleaning.
- Utilize common Machine Learning algorithms such as Linear and Logistic Regression.
- Learn how to evaluate models and choose the most effective one.
- Understand how to interpret a Confusion Matrix
- Solidify understanding by completing hands-on exercises and milestones.
- Understand the big picture and the importance of data science in business, industry, and technology
- Python Programming knowledge
- Anyone interested in learning the basics of Data Science
- Anyone interested in learning the basics of Data Science
Course Outline:
Day 1:
- Course Introduction
- Installing Anaconda
- Overview of Data Science
- The Difference Between Business Analytics, Data Analytics and Data Science
- The Data Science Process
- Define the Problem
- Get the Data
- Explore the Data
- Clean the Data
- Model the Data
- Communicate the Findings
- Descriptive Statistics Fundamentals
- Central Tendency
- Mean
- Median
- Mode
- Spread of the Data
- Variance
- Standard Deviation
- Range
- Relative Standing
- Percentile
- Quartile
- Inter-quartile Range
- Data Libraries
- Numpy
- Pandas
- Accessing Data
- CSV, TSV, JSON
- Importing data from a MySQL database into Pandas
- Data Exploration
- Describe
- Grouping
- Feature Selection
- Feature Engineering
- Milestone 1: Use Pandas to perform exploratory data analysis.
Day 2:
- Data Preparation
- Data Cleaning
- Dropping Rows
- Imputing Missing Values
- Feature Selection
- Data Transformation
- One-Hot Encoding
- Standardization
- Normalization
- Feature Engineering
- Inferential Statistics Fundamentals
- Normal Distribution
- Central Limit Theorem
- Standard Error
- Confidence Intervals
- Samples
- Hypothesis Testing
- Data Visualization
- Pandas
- Matplotlib
- Seaborn
- Machine Learning Overview
- Machine Learning Algorithms
- Linear Regression
- Logistic Regression
- Support Vector Machine
- Decision Tree
- Random Forest
- K-Nearest Neighbors
- K-Means Clustering
- Dimensionality and Sparsity
- Principal Component Analysis
- Singular Value Decomposition
- Factor Analysis
- Milestone 2: Use Pandas to clean, impute and engineer features
Day 3:
- Model Development
- Scikit-learn
- Test/Train Split
- Boosting and Ensembles
- Stacking
- Model Evaluation
- K-Fold Cross Validation
- Grid Search
- Feature Importance
- Confusion Matrix
- Accuracy vs. Precision vs. Recall
- Learning Curve
- Overfitting vs. Underfitting
- Recommendation Systems
- Natural Language Processing
- Processing Text
- Encoding Text
- Building a NLP pipeline
- Model Maintenance
- Roles and Responsibilities
- Documentation
- Close-out Activities
- Milestone 3: Build, evaluate, and refine a machine learning model
- Conclusion: Data Science in the real world, next steps.
Structured Activity/Exercises/Case Studies:
- Milestone Project 1: Use Pandas to perform exploratory data analysis.
- Milestone Project 2: Use Pandas to clean, impute and engineer features
- Milestone Project 3: Build, evaluate, and refine a machine learning model
Training material provided: Yes (Digital format)