- Overview
- Audience
- Prerequisites
- Curriculum
Description:
This comprehensive 4-day training equips participants with the skills needed to develop, deploy, and manage real-time data streaming applications using Apache Kafka. Starting with an overview of Big Data architectures and the role of Kafka in modern data pipelines, participants will explore Kafka’s publish-subscribe model, distributed architecture, and core components such as producers, consumers, topics, partitions, and offsets.
The training dives deep into advanced development topics, including the Kafka Streams API for real-time data processing, and Kafka Connect for integrating Kafka with relational databases, cloud storage systems, and other external platforms. Learners will gain hands-on experience building Java-based Kafka applications, managing offsets and delivery guarantees, and constructing end-to-end streaming pipelines.
Beyond application development, the course also addresses Kafka’s operational aspects, such as cluster management, topic configuration, monitoring, and performance tuning. It concludes with an in-depth look at Kafka security—covering SSL, SASL, ACLs, and encryption strategies—along with Zookeeper administration and troubleshooting techniques for production environments. Real-world examples, best practices, and guided labs ensure learners are ready to build and support scalable, secure, and reliable Kafka-based systems in enterprise environments.
Duration: 4 Days
Course Code: BDT 507
Learning Objectives:
By the end of this course, participants will be able to:
- Understand Big Data architecture and Kafka’s role in real-time processing.
- Deploy Kafka clusters and work with Producers, Consumers, and Streams.
- Integrate Kafka with databases and storage using Kafka Connect.
- Secure Kafka clusters with SSL/SASL and implement access control.
- Monitor, troubleshoot, and scale Kafka deployments for production workloads.
This course is ideal for:
- Data Engineers and Software Developers
- DevOps and Platform Engineers managing Kafka clusters
- Streaming and Real-Time Analytics Professionals
- Architects designing event-driven systems
- Familiarity with Java (or any another programming language)
- Basic understanding of distributed systems
- Exposure to Linux command line
Course Outline:
Module 1: Big Data and Kafka Introduction
- 3Vs of Big Data and evolution
- Real-time vs batch processing
- Industry use cases
Module 2: Overview of Hadoop & Streaming Frameworks
- Apache Hadoop: HDFS and MapReduce overview
- Spark and Storm for real-time data analysis
Module 3: Introduction to Zookeeper
- Zookeeper basics and recipes (leader election, locking)
- Installing/configuring Zookeeper
- Hands-on: Ubuntu setup, SSH, and connecting with PuTTY
Module 4: Introduction to Apache Kafka
- Kafka design principles
- Kafka as a distributed commit log
- Partitions, topics, replication, and message queues
- Publish-subscribe pattern explained
Module 5: Kafka Installation and Configuration
- Installing Kafka and configuring brokers
- Cluster setup using Zookeeper
- Hands-on: Installing and configuring Kafka + Zookeeper
Module 6: Kafka Producers and Consumers
- Producer API and serializers (Avro, JSON, Protobuf)
- Consumer API, groups, offsets, and delivery semantics
- At-most-once, at-least-once, exactly-once guarantees
Module 7: Kafka Data Model and Java Integration
- Topics, partitions, and offsets in practice
- Compiling and running Kafka Java clients
- Hands-on: Writing Kafka Producer and Consumer apps
Module 8: Kafka Streams API
- Difference between batch and stream processing
- Kafka Streams API: KStreams, KTables, stateful operations
- Event-time processing and windowing
- Hands-on: Real-time word count and stream joins
Module 9: Kafka Connect and Data Integration
- Kafka Connect architecture
- Source/sink connectors: JDBC, S3, HDFS, Elasticsearch
- Building ETL-style pipelines using Kafka Connect
- Hands-on: Configuring Kafka Connect with databases
Module 10: Real-World Kafka Development Lab
- Build a full pipeline from producer → Kafka → stream processor → sink
- Hands-on: End-to-end pipeline using Kafka Streams + Connect
Module 11: Kafka Cluster Operations
- Broker lifecycle: adding/removing brokers
- Topic configuration, replication, and leader election
- Monitoring clusters using metrics and logs
Module 12: Kafka Security
- Authentication with SSL and SASL
- Access control with ACLs
- Encryption at rest and in transit
- Hands-on: Securing Kafka brokers and clients
Module 13: Kafka Performance Tuning
- Tuning producers and consumers for throughput
- Compression, batching, and partitioning strategies
- Broker-level tuning
Module 14: Zookeeper for Kafka Operations
- Zookeeper’s role in Kafka cluster metadata
- Best practices and operational tips
- Securing Zookeeper nodes
Module 15: Monitoring and Troubleshooting
- Tools: Kafka Manager, Grafana, Prometheus
- Common issues: unbalanced partitions, lag, GC, leader election issues
- Hands-on: Benchmarking performance, tuning parameters, viewing metrics
Training Material Provided:
Course slides and reference guides




