Applied Machine Learning using Python and Apache Spark
- Created By raju2006
- Last Updated December 21st, 2023
- Overview
- Prerequisites
- Audience
- Audience
- Curriculum
Description:
This course will provide you with a thorough understanding of Machine Learning concepts, terminology and usage. It would enable you to perform Machine Learning in two ways, namely using Python libraries and Apache Spark.
Long Description:
Master the art of Applied Machine Learning using Python and Apache Spark. This comprehensive course equips you with a deep understanding of Machine Learning concepts and practical usage. You'll harness the power of Python libraries like NumPy, Pandas, Matplotlib, and Scikit-learn for data analysis and preparation. Plus, explore Apache Spark's distributed architecture to scale Machine Learning for large datasets using Spark MLlib. With hands-on projects and Databricks Notebooks, you'll gain valuable experience and insights into both Python-based and Apache Spark-driven Machine Learning. Elevate your skills and choose the best tools for your data analysis needs with "Applied Machine Learning using Python and Apache Spark" training.
Course Code/Duration:
BDT10 / 3 Days
Learning Objectives:
After this course, you will be able to:
- Have a basic understanding of Machine Learning
- Understand the differences between Supervised and Unsupervised Learning
- Understand how to use Python libraries to explore, clean and prepare data
- Describe the role of Machine Learning and where it fits into Information Technology strategies
- Explain the technical and business drivers that result from using Machine Learning
- Understand techniques like Classification, Clustering and Regression
- Discuss how to identify which kinds of technique to be applied for specific use case
- Understand the popular Machine offerings like Amazon Machine Learning, TensorFlow, Azure Machine Learning, Google Cloud ,Spark mlib, Python and R etc.
- Install and Setup Anaconda.
- Perform hands-on activities using Jupyter Notebooks.
- Understand the popular Machine Learning Algorithms like Linear Regression, Decision Tree, Logistic Regression, K Nearest Neighbor, K-Means clustering etc.
- Perform hands-on activity on Python libraries like NumPy, Pandas, Matplotlib and Scikit-learn
- Understand Apache Spark Processing Framework and distributed architecture
- Compare Machine learning using Python versus Apache Spark
- Perform hands-on activity on Databricks cloud using Apache Spark MLlib
- Familiarity with Python required
- No machine learning knowledge is required
- Working knowledge of Spark is a plus
- Data Analysts, Software Engineers, Data Engineer, Data Professional, Business Intelligence Developer, Data Architect
- Data Analysts, Software Engineers, Data Engineer, Data Professional, Business Intelligence Developer, Data Architect
Course Outline:
Day 1
- Course Introduction
- History and background of Machine Learning
- Compare Traditional Programming Vs Machine Learning
- Supervised and Unsupervised Learning Overview
- Machine Learning patterns
- Classification
- Clustering
- Regression
- Gartner Hype Cycle for Emerging Technologies
- Machine Learning offerings in Industry
- Hands-on exercise 1 – Install and Setup Anaconda.
- Descriptive statistics
- Milestone 1: Learn how to use Jupyter Notebooks
- Essential libraries
- Numpy
- Pandas
- Matplotlib
- Milestone 2: Exploratory data analysis
Day 2
- Getting data
- Feature selection
- Essential libraries
- Scikit-learn
- Milestone 3: End to End project (Initiation)
- Transforming data
- Binary encoding
- One-hot encoding
- Feature Engineering
- Algorithms
- Linear Regression
- Naive Bayes
- Decision Tree
- Random Forest
- Logistics Regression
- Support Vector Machine
- K-Nearest Neighbor
- K-Means Clustering
- Milestone 4: Data modeling
Day 3
- Apache Spark Overview
- Spark Libraries
- Compare Machine Learning using Python vs Spark
- Milestone 5: Databricks Cloud Community Account Setup
- Measuring performance
- Confusion Matrix
- ROC curve, Area Under Curve (AUC)
- Refining the model
- Hyper parameter tuning
- Grid search
- Milestone 6: Spark mLlib Hands-on
- Milestone 7: End-to-end project Completion
- Next steps
Structured Activity/Exercises/Case Studies:
Day 1:
- Milestone 1 – Learn how to use Jupyter Notebooks
- Milestone 2 – Exploratory data analysis
Day 2
- Milestone 3 – End to End project (Initiation)
- Milestone 4 – Data modeling
Day 3
- Milestone 5 – Model selection
- Milestone 6 – Spark mLlib Hands-on
- Milestone 7 – End-to-end project (Completion)
Training material provided:
Yes (Digital format)