- Overview
- Prerequisites
- Audience
- Audience
- Curriculum
Description:
Empower your machine learning operations with our 'Apache Airflow Training for Machine Learning Operations' course. Tailored for machine learning engineers, this program equips you to create reproducible training sets, build and validate models, and deploy them confidently. Explore the complexities of reproducible CI/CD pipelines in machine learning and how Apache Airflow simplifies batch training workflows using Directed Acyclic Graphs (DAGs). You'll gain a solid understanding of Airflow's foundations, applying them to real-world machine learning challenges, including sentiment prediction in tweet streams. This course offers a hands-on learning approach, with a focus on creating reproducible pipelines with Airflow. Join us to elevate your machine learning operations with Apache Airflow.
Duration: 3 Days
Course Code: BDT291
Learning Objectives:
After this course, you will be able to:
- Migrate their Machine Learning training workflows to scalable pipelines in Apache Airflow
- Take a raw dataset and a model architecture and be able to take the project to the end deploying it in the cloud
- Enforce reusability and modularization of pipelines for easy collaboration.
- Although there is no background needed except basic Python knowledge or object-oriented programming experience, any knowledge of Machine Learning can help boost your learning.
- People being curious about data engineering.
- People who want to learn basic and advanced concepts about Apache Airflow.
- People who like hands-on approach.
- People being curious about data engineering.
- People who want to learn basic and advanced concepts about Apache Airflow.
- People who like hands-on approach.
Course Outline:
The scalable problem of Machine Learning Pipelines
- What problems arise when trying to create a Machine Learning model?
- The components of a Machine Learning platform
- Introducing Apache Airflow
- Airflow architecture
- How do we represent a Machine Learning Pipeline?
- Demo: Our first DAG
- Tasks, TaskFlows, and Operators
- Demo: First Pipeline
- Capstone Lab: Cresting the datasets for training
Creating our Machine Learning Pipeline
- Using custom operators
- Demo: Creating a Train Operator
- Creating TaskGroups vs subDAGs
- Sharing data with xCOMs
- Branching and Triggers
- Sensors and SmartSensors
- Demo: Adding a sensor to validate enough new data
- Capstone Lab: Adding training, validation and delivery steps to our pipeline
Mastering scheduling
- Execution_date, start_date and schedule_interval
- Handling non-default schedule_intervals
- Demo: Playing with time
- Capstone Lab: Using Sensors with a correct schedule_interval
Enabling concurrency and scalability
- Abandoning SQLite to PostgreSQL
- Executors: Debug, Local, Celery
- Concurrency and parallelism
- Demo: Concurrency with Celery
Hackathon: Sentiment Prediction from Twitter
Software Required
This Apache Airflow for Machine Learning Operations course is taught using Python > 3.5, Apache Airflow > 2.1, scikit-learn > 1.1, and PyTorch > 1.8. On request, we can provide either a remote VM environment for the class or directions for configuring this environment on your local PCs.
Training material provided: Yes (Digital format)