Instructor

raju2006

Introduction to Spark 3.x

10 weeks

All levels

0 lessons

0 quizzes

0 students

Introduction to Spark 3.x

Created By raju2006
Last Updated February 17th, 2025

Overview
Prerequisites
Audience
Curriculum

Description:

Unlock the full potential of the Apache Spark platform with our advanced course, following the introductory 'Introduction to Apache Spark version 2.3.' In this comprehensive program, you'll delve deep into Spark, mastering the art of building unified big data applications that blend batch and interactive analytics for all your data. Developers will gain the ability to create complex parallel applications that lead to quicker and more informed decisions, as well as real-time actions. Furthermore, the course provides in-depth insights into performance optimization and debugging techniques, and covers the exciting world of Spark Machine Learning. Elevate your data analytics skills with this essential training.

Course Code/Duration:

BDT168 / 2 Days Lectures/Labs (Virtual)

Learning Objectives:

In this course, participants will:

Master core Apache Spark 3.x APIs and fundamental platform mechanisms.
Utilize PySpark, Python, and SQL to access and transform data effectively.
Gain proficiency in working with essential tools such as Jupyter Notebooks and Anaconda.
Learn to handle data in Apache Parquet format for comprehensive data analysis capabilities.

Introduction to Apache Spark version 3.x.

This course is designed for developers and data analysts.

Course Outline:

Day 1: Spark Libraries

Continue understanding Spark Dataframes
Look up tables and Joins with Spark Dataframes
Understanding data partitioning on Dataframes
Data transformations with pipelines
Understanding Machine Learning Use cases and Techniques
Understand what is machine learning?
Machine learning development v/s traditional software development
Importance of data in machine learning
Machine Learning Development
- Learn about the steps involved in machine learning development
- Understand how the machine learns?
- Machine Learning Algorithms and tools supported in Spark ML Library
Building Classification and Regression Models
- Building regression models and evaluating model performance
- Building classification models and evaluating model performance
- Understanding feature engineering
- Model persistence
Multiple Hands-on
- Using PySpark on Spark Dataframes
- Data transformations on dataframe
- Extra Credits

Day 2: Machine Learning Library and Streaming

Data clustering with Spark Machine Learning library
- Perform data clustering using Spark Machine learning library
- Understand finding optimal clusters on a dataset
Streaming Data
- Understand what is streaming data?
- Design Challenges with Streaming data
Structured Streaming with Spark
- Spark library streaming history from Spark 1.x to Spark 3.x
- Enhancements to Structured Streaming library
- Learn about streaming output modes
- Perform aggregation on streaming data
Spark ML library and Streaming library
- Build machine learning model and persist it
- Load model and use it to make predictions on streaming data
Multiple Hands-on
- Multiple hands-on sessions on the above topics
- Extra Credits session

Lab Environment

Each student will be provided a virtual machine for performing hands-on labs, they are expected to use these machines in class
These machines will be configured for Spark-3.x release
Instructions will be provided to students for setting up environment on their machines (there will be no support for debugging their environment)

Training material provided: Yes (Digital format)

The curriculum is empty

raju2006

242 Courses

0.0 Avg Review

[INSERT_ELEMENTOR id="19900"]

Looking for Team Training?

Up-skill your team with a customized, private training

Public Classes

Suitable for small teams and individuals

Get Started

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Introduction to Spark 3.x

Introduction to Spark 3.x

Training material provided: Yes (Digital format)

raju2006

Looking for Team Training?

Public Classes

Get Started

Kickstart Terraform in a Day

Byte-Sized IoT Series: Internet of Things (IoT) Device Telem

DevOps For Leaders

Applied Artificial Intelligence and Machine Learning on Goog

Byte-Sized Agile Series: Writing Great User Stories

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Introduction to Spark 3.x

Introduction to Spark 3.x

Training material provided: Yes (Digital format)

raju2006

Looking for Team Training?

Public Classes

Get Started

Related Courses

Kickstart Terraform in a Day

Byte-Sized IoT Series: Internet of Things (IoT) Device Telem

DevOps For Leaders

Applied Artificial Intelligence and Machine Learning on Goog

Byte-Sized Agile Series: Writing Great User Stories

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Modal title