Instructor

shambhvi

Apache Beam with Google Dataflow

10 weeks

All levels

0 lessons

0 quizzes

0 students

Apache Beam with Google Dataflow

Created By shambhvi
Posted on August 7th, 2025

Overview
Audience
Prerequisites
Curriculum

Description:

This hands-on training introduces developers and data engineers to Apache Beam, a unified programming model for building portable and scalable data processing pipelines, with deployment on Google Cloud Dataflow. Participants will begin with the foundations of the Beam model—its architecture, execution flow, and key abstractions—and gradually move into working with core transforms like ParDo, Map, Filter, CoGroupByKey, and Composite Transforms.

The course emphasizes both batch and streaming use cases, showcasing how to connect Beam with Google Cloud Pub/Sub, set up streaming projects on GCP, and run real-time data pipelines on Google Dataflow. Learners will also understand how Beam handles type safety, data encoding, and advanced pipeline features such as side inputs and multiple outputs. A real-world case study on identifying defaulter customers helps reinforce learning through application.

Duration: 3 Days

Course Code: BDT 508

Learning Objectives:

By the end of this course, participants will be able to:

Understand Apache Beam’s architecture and unified data processing model.
Build batch and streaming pipelines using Beam's core and advanced transforms.
Use Google Cloud Pub/Sub and Dataflow to run scalable real-time pipelines.
Apply data encoding, type hints, and coders in Beam pipelines.
Design modular, reusable pipelines using composite transforms and joins.

This course is ideal for:

Data Engineers and Streaming Pipeline Developers
Google Cloud Developers and Architects
Engineers migrating from Spark, Flink, or Airflow
Developers building real-time analytics and ETL workflows

Basic programming knowledge (Python or Java)
Familiarity with cloud concepts (preferably Google Cloud)
Some experience with data processing frameworks (e.g., Spark, Hadoop) is helpful

Course Outline:

Module 1: Introduction to Apache Beam

Evolution of Big Data Frameworks
Overview and use cases of Apache Beam
Apache Beam Architecture and SDKs
Beam’s portable and unified programming model

Module 2: Beam Setup and Basic Concepts

Key abstractions: PCollection, PTransform, Pipeline
Installing Beam and setting up dev environment
Building your first Beam pipeline (local runner)

Module 3: Working with Beam Transforms

Structure of a Beam pipeline
Input transforms: Read, Create
Output transforms: Write to files, databases
Core transforms: Map, FlatMap, Filter
Hands-on: Basic read-transform-write pipeline

Module 4: Pipeline Logic and Advanced Transforms

ParDo, DoFn and branching logic
Composite transforms: abstraction and reuse
Aggregations: Combine, CombinePerKey
Joins with CoGroupByKey
Hands-on: Branching + CoGroupByKey joins

Module 5: Side Inputs and Outputs

Working with side inputs for auxiliary data
Creating multiple outputs from one transform
Hands-on: Multi-output transformation and filtering

Module 6: Case Study – Identifying Bank Defaulters

Understanding credit card and loan data
Creating modular pipelines for different defaulter types
Hands-on: Implement pipeline to flag defaulters

Module 7: Type Hints and Coders in Beam

What is type safety in data pipelines?
Using Coder class for serialization
Type hints in Beam and how Beam ensures type safety
Hands-on: Using type hints and coders in a custom pipeline

Module 8: Introduction to Streaming in Beam

Event-driven processing vs. batch workflows
Pub/Sub architecture and flow
Windowing and watermarking (intro only, optional deep dive)
Hands-on: Pub/Sub demo with sample topic/stream

Module 9: Apache Beam with Google Cloud Dataflow

Connecting Beam pipelines to GCP
Creating and configuring Pub/Sub topics
Running batch and streaming jobs on Dataflow
Hands-on: Run a streaming pipeline from Pub/Sub to GCS or BigQuery

Training Material Provided:

Course slides and reference guides

The curriculum is empty

shambhvi

93 Courses

0.0 Avg Review

Looking for Team Training?

Up-skill your team with a customized, private training

Public Classes

Suitable for small teams and individuals

Get Started

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Apache Beam with Google Dataflow

Apache Beam with Google Dataflow

Description:

Course Outline:

shambhvi

Looking for Team Training?

Public Classes

Get Started

Data Engineering on Google Cloud Platform Training

LLM Engineering: Master Class

AWS Cloud Practitioner Training

AI and Deep Learning using Apache Spark

Prepare for Microsoft Certified Devops Engineer Expert

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Apache Beam with Google Dataflow

Apache Beam with Google Dataflow

Description:

Course Outline:

shambhvi

Looking for Team Training?

Public Classes

Get Started

Related Courses

Data Engineering on Google Cloud Platform Training

LLM Engineering: Master Class

AWS Cloud Practitioner Training

AI and Deep Learning using Apache Spark

Prepare for Microsoft Certified Devops Engineer Expert

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Modal title