Instructor

raju2006

Streaming Systems using Apache Kafka and Spark

10 weeks

All levels

0 lessons

0 quizzes

0 students

Streaming Systems using Apache Kafka and Spark

Created By raju2006
Last Updated February 16th, 2025

Overview
Prerequisites
Audience
Curriculum

Description:

Real-real data ingestion and analysis is becoming a vital need for enterprises to have a data lake to facilitate high velocity data needs. In this course, we would explore the Big data ecosystem and deep-dive into the real-time processing areas. Open Source technologies like Apache Kafka and Spark will be incorporated to provide an end-to-end solution for creating real time streaming applications.

Long Description:

Elevate your enterprise's data capabilities with our comprehensive training course on real-time data ingestion and analysis. As high-velocity data becomes increasingly vital, we'll guide you in building a robust data lake to meet these demands. Dive deep into the Big Data ecosystem and focus on real-time processing, leveraging open-source tools like Apache Kafka and Spark. This course offers a holistic view of the Big Data landscape and its applications in real-time streaming. We'll explore alternatives to Apache Kafka and Spark Streaming, empowering you with a distributed architecture to set up, ingest, and process real-time data. Join us to unlock the potential of real-time data solutions.

Course Code/Duration:

BDT31 / 3 Days

Learning Objectives:

After this course, you will be able to:

Have a broad understanding of Big Data Ecosystem.
Understand the differences between batch and real time streaming scenarios.
Understand how to use distributed architecture with clusters to be able to implement real-time streaming system.
Discuss how to identify which kinds of technologies to be applied for specific use case.
Explain the technical and business drivers that result from using streaming system.
Understand the architecture and design of Apache Kafka.
Compare Apache Kafka to other alternatives like Flume, Storm, Amazon Kinesis.
Understand the Big Data Ecosystem before and after Apache Spark.
Understand Apache Spark Processing Framework and distributed architecture.
Install and Setup Big Data cluster.
Perform hands-on activities using twitter data.

Familiarity with Java/Scala required
Familiarity with Big data applications
Working knowledge of Spark is a plus

Data Analysts, Software Engineers, Data Engineer, Data Professional, Business Intelligence Developer, Data Architect

Course Outline:

Day 1

Course Introduction
History and background of Big Data
Advantages of Distributed Architecture
Big data Ecosystem before Apache Spark
Big data Ecosystem after Apache Spark
Spark Data structures: RDDs, DataFrames, Datasets
Primer of Spark Libraries like
- Spark SQL
- Spark MLlib,
- Spark Streaming,
- Spark GraphX
- Spark Deep Learning
- Writing Spark applications using Spark APIs
- Spark streaming
- Structured streaming
Writing Spark applications using Spark APIs
Spark streaming
Structured streaming

Day 2

Data Ingestion systems for structured and unstructured data
Kafka design & architecture
Compare Kafka to Flume, Storm, Amazon Kinesis
Getting Kafka up and running
Using Kafka utilities
Reading & Writing to Kafka using Java API
Labs: all of the above sections

Day 3

Implementing Spark and Kafka together
Reading Kafka streams from Spark
Saving streaming data from Spark into Cassandra
Full end to end application
Benchmarking
Monitoring
Tuning and Optimizing the system
Labs: all of the above sections
End-to-end Streaming project
Next steps

Structured Activity/Exercises/Case Studies:

Day 1

Milestone 1 – Create account on Databricks Cloud
Milestone 2 – Learn how to use Databricks Notebooks
Milestone 3 – Spark RDD implementations

Day 2

Milestone 3 – End to End project (Initiation)
Milestone 4 – Kafka setup
Milestone 5 – Kafka hands-on

Day 3

Milestone 6 – Kafka with structured Streaming
Milestone 7 – End-to-end project (Completion)

Training material provided:

Yes (Digital format)

The curriculum is empty

raju2006

242 Courses

0.0 Avg Review

[INSERT_ELEMENTOR id="19900"]

Looking for Team Training?

Up-skill your team with a customized, private training

Public Classes

Suitable for small teams and individuals

Get Started

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Streaming Systems using Apache Kafka and Spark

Streaming Systems using Apache Kafka and Spark

Description:

Long Description:

Course Code/Duration:

Learning Objectives:

Course Outline:

Day 1

Day 2

Day 3

Structured Activity/Exercises/Case Studies:

Day 1

Day 2

Day 3

Training material provided:

raju2006

Looking for Team Training?

Public Classes

Get Started

Kickstart Terraform with AWS in a Day

Advanced Git For Developers

Introduction to Power BI

Real World Use cases of AI and Data Science

Change Management Skills for successful corporate transforma

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Streaming Systems using Apache Kafka and Spark

Streaming Systems using Apache Kafka and Spark

Description:

Long Description:

Course Code/Duration:

Learning Objectives:

Course Outline:

Day 1

Day 2

Day 3

Structured Activity/Exercises/Case Studies:

Day 1

Day 2

Day 3

Training material provided:

raju2006

Looking for Team Training?

Public Classes

Get Started

Related Courses

Kickstart Terraform with AWS in a Day

Advanced Git For Developers

Introduction to Power BI

Real World Use cases of AI and Data Science

Change Management Skills for successful corporate transforma

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Modal title