Instructor

raju2006

Intermediate Spark 3.X

10 weeks

All levels

0 lessons

0 quizzes

0 students

Intermediate Spark 3.X

Created By raju2006
Last Updated February 21st, 2025

Overview
Pre-requisite
Audience
Curriculum

Session Description:

This course is an introduction to Apache Spark version 3.x. The course covers the core APIs for using Spark and fundamental mechanisms of the platform. The students will use the PySpark, Python and SQL to access and transform data. We also cover Jupyter Notebooks, Anaconda and Apache Parquet.

Course Code/Duration:

BDT167 / 2 Days Lectures/Labs (Virtual)

Learning Objectives:

In this course, participants will:

General knowledge of data stores and a working knowledge of the Python language.

This course is designed for developers and data analysts.

Topic Outline:

Review Hadoop architecture
Understand the fundamental architecture of Spark
Use the core Spark APIs to operate on data
Understand the Spark Ecosystem
Use PySpark and Python
Create RDDs
Create Dataframes
Use RDDs and DataFrames
Use SQL on DataFrames
Use Jupyter notebook

Course Outline:

Day 1 Foundations

Overview

Lab Environment
PySpark
Spark Data

Day 2 Programming Techniques

Spark Programming
RDDs
DataFrames
Jupyter Notebook

Detail Course Topics

Day 1 Foundations

Big Data and History
Big Data & Hadoop Deployment
Hadoop
Architecture
Understanding Storage and Processing with Hadoop
Map Reduce and its limitations
Eco System with Hadoop
Spark
Why Spark?
Spark Eco System
Spark Architecture
Spark Cluster
Understanding RDD and its value
Spark Dataframes and Spark SQL
Understanding PySpark
Multiple Hands-on
Using Jupyter Notebook
Spark command line
RDD and DataFrame using PySpark
Assignment

Day 2 Programming Techniques

RDD and DataFrames programming
Understanding Spark internals when using RDD and DataFrames
Schema definition
Schema definition
Partitioning
User defined Functions
Handling Corrupt Records
Basic ETL task using PySpark
Handling data transformations for downstream processing
Multiple Hands-on
Hands on Jupyter Notebooks with exercises
Assignment Review
Lab Environment
Each student will be provided a virtual machine for performing hands-on labs, they are expected to use these machines in class
These machines will be configured for Spark-2.4.7 release
Instructions will be provided to students for setting up environment on their machines (there will be no support for debugging their environment

Training material provided: Yes (Digital format)

The curriculum is empty

raju2006

242 Courses

0.0 Avg Review

[INSERT_ELEMENTOR id="19900"]

Looking for Team Training?

Up-skill your team with a customized, private training

Public Classes

Suitable for small teams and individuals

Get Started

CompTIA A+ Certification

raju2006

Free

Neural Network-based Classification with R

raju2006

Free

Art of Influence

raju2006

Free

Dialogflow: Building Conversational AI Applications

raju2006

Free

Customer Centricity

raju2006

Free

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Intermediate Spark 3.X

Intermediate Spark 3.X

Session Description:

Course Code/Duration:

Learning Objectives:

Topic Outline:

Training material provided: Yes (Digital format)

raju2006

Looking for Team Training?

Public Classes

Get Started

CompTIA A+ Certification

Neural Network-based Classification with R

Art of Influence

Dialogflow: Building Conversational AI Applications

Customer Centricity

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Intermediate Spark 3.X

Intermediate Spark 3.X

Session Description:

Course Code/Duration:

Learning Objectives:

Topic Outline:

Training material provided: Yes (Digital format)

raju2006

Looking for Team Training?

Public Classes

Get Started

Related Courses

CompTIA A+ Certification

Neural Network-based Classification with R

Art of Influence

Dialogflow: Building Conversational AI Applications

Customer Centricity

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Modal title