Instructor

raju2006

Data Engineer – Bootcamp

10 weeks

All levels

0 lessons

0 quizzes

0 students

Data Engineer – Bootcamp

Created By raju2006
Last Updated May 9th, 2025

Overview
Prerequisites
Audience
Curriculum

Description:

Join our 10-week Data Engineer Bootcamp to master essential data engineering skills. Learn to build robust Data Pipelines using SQL, Python, Spark SQL, and Pyspark, and efficiently execute them on various clusters. Our Agile Scrum Methodology-driven workshop covers SQL Fundamentals, data engineering principles, Linux, Python, and more. Dive into Big Data Technologies with Apache Spark, explore databases, including MongoDB, and gain cloud expertise with GCP. Acquire hands-on experience in DevOps, Docker, Kubernetes, and CI/CD, preparing you for a rewarding career in data engineering. Unlock a world of data possibilities and become a skilled data engineer ready to tackle complex projects in this comprehensive Data Engineer Bootcamp.

Course Code/Duration:

BDT245 / 10 Weeks

Learning Objectives:

Understand Agile Scrum methodology for efficient project management.
Master SQL, covering query writing, table operations, and functions.
Develop proficiency in Python programming, focusing on data manipulation.
Learn JavaScript basics for web development and dynamic content.
Gain expertise in data engineering, including data preparation and pipeline building.
Use PyTorch for creating and training artificial neural networks for classification.
Set up a Python environment with Jupyter notebooks.
Explore Python data structures, functions, modules, and OOP.
Grasp big data concepts, Hadoop, and Apache Spark's role in data processing.
Apply Apache Spark for data analysis, machine learning, and real-time streaming.

Understanding of how computers work
One or more years technical experience
Programming experience with Python & SQL would be a plus.

Candidates with Computer Science degree or equivalent experience and pursuing their first IT role with the focus on Data Engineering and Data Science.

Course Outline:

Agile Scrum Methodology

Scrum Introduction
Scrum Team
Scrum Artifacts
Sprint Increment
Spring planning
Backlog
Retrospective
Project description and Case Study
Practice exam and Knowledge check
Certification (optional)

SQL

SQL Fundamentals
Writing SQL Queries
Working Tables and Indexes
Predefined SQL functions
Connecting Python to SQL
Certification (optional)

Data Engineering Principles

Data engineering to prepare data for downstream needs
Build pipelines for batch processing and streaming processing
Understanding different types of data
Using PyTorch for Artificial Neural Network – Classification

Python Programming – Fundamentals

Set Up
Set up development environment – Jupyter notebooks
Using python shell
Executing python script
Understanding python strings
Print statements in python
Data Structures in python
Integers
Lists
Dictionaries
Tuple
Sets
File
Mutable and Immutable structures
Selection and Looping Constructs
If/else/elif statements

Boolean type
“in” membership
For loop
While Loop
List and Dictionary Comprehension
Functions
Defining functions
Variable scope – Local and Global
Arguments
Polymorphisms
Modules
Creating modules
Importing Modules
Different types of imports
Dir and help
Examining some built-in modules
Classes & Exceptions
Object Oriented Programming Introduction
Classes and Objects
Polymorphism – Function and Operator Overloading
Inheritance

Big Data Overview

History and background of Big Data and Hadoop
5 V’s of Big Data
Big Data Distributions in Industry
Big Data Ecosystem before Apache Spark
Big Data Ecosystem after Apache Spark
Comparison of MapReduce Vs Apache Spark
Big Data Ecosystem after Apache Spark
Spark Clusters

Getting started with Apache Spark

Understanding Apache Spark Components and Libraries
Introduction to Pyspark
Explore using Pyspark in Databricks Cloud Environment
Pyspark code examples
Working with Jupyter Notebook

Working with Spark SQL

Getting started with Spark SQL
Spark Context and Spark Session
Performing basic data transformations with Spark SQL CLI
Managing Tables with Spark SQL
Spark SQL functions

Apache Spark Data Structures – RDD

Understanding fundamental data structure in Spark – RDD
Understanding Linage and Lazy Evaluation with RDD
Performing RDD transformations
Performing RDD actions
RDD persistence and caching

Apache Spark Data Structures – Data frames

Understanding another data structure in Spark – Data frames
Reading different file formats into Data frame
Creating and Inferring Data frame schema
Basic transformations on Data frames
Basic actions on Data frames
Apply functions such as filtering, group by, etc. on Data frame
Aggregations, Sum, Mean, on Data frame
Preparing data transformation pipelines

Apache Spark Data Structures – Data frames (Advanced)

Apply functions such as filtering, group by, etc. on Data frame
Aggregations, Sum, Mean, on Data frame
Handling corrupt records
Preparing data transformation pipelines
Working with Data frame Joins
Understanding Data frame Do’s and Don’ts
Spark Lifecycle and Spark UI

Machine Learning Overview

History and Background of AI and ML
Compare AI vs ML vs DL
Describe Supervised and Unsupervised learning techniques and usages
Machine Learning patterns
Classification
Clustering
Regression
Gartner Hype Cycle for Emerging Technologies
Machine Learning offerings in Industry
Discuss Machine Learning use cases in different domains
Understand the Data Science process to apply to ML use cases
Understand the relation between Data Analysis and Data Science
Identify the different roles needed for successful ML project
Prepare machine learning data using pipelines – Data manipulation

Spark Machine Learning Library

Prepare machine learning data using pipelines – Data manipulation
Building Classification models with Spark Machine Learning Library
Building Regression models with Spark Machine Learning Library
Clustering with Spark ML library
Understanding model performance and metrics
Data pipeline and model persistence

Spark Streaming Library

Streaming data and its challenges
Understanding Spark Structured Streaming
Working with Spark Streaming output modes
Aggregations on streaming data

NoSQL

Relational v/s Non-Relational Databases
What are NoSQL databases?
Types of NoSQL databases

Document Datastore: MongoDB

MongoDB Introduction
Understanding Basics and CRUD operations
Structuring Documents
Create Operations
Read Operations on Collections
Updating Documents
Deleting Documents
Working with Indexes
Working with different data types
Using MongoDB Compass to explore data visually
Integrating Apache Spark with MongoDB

DevOps Toolkit

DevOps Overview
Containers with Docker
Orchestrating containers with Kubernetes
Understanding Continuous Integration
Understanding Continuous Delivery and Deployment

Cloud Computing Foundations (AWS or GCP)

Cloud Computing Overview
Security with Google’s Cloud Infrastructure
Understanding resource hierarchy
IAM – Identity and Access Management
Different IAM Roles
Connecting to Google Cloud Platform
Understanding different compute options
Working with different Relational and NoSQL databases on GCP
GCP Data Warehouse: Big Query

Project & Use Case

Project Overview
Complete projects to get experience and practice
Industry Use Case Studies

Certification

Certification Overview
Identify the right certification for you
Tips to prepare for certification

Training material provided:

Yes (Digital format)
Hands-on Lab: Instructions will be provided to install Anaconda and PyTorch on student’s machines.

The curriculum is empty

raju2006

242 Courses

0.0 Avg Review

[INSERT_ELEMENTOR id="19900"]

Looking for Team Training?

Up-skill your team with a customized, private training

Public Classes

Suitable for small teams and individuals

Get Started

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Data Engineer – Bootcamp

Data Engineer – Bootcamp

Description:

Course Code/Duration:

Course Outline:

Agile Scrum Methodology

SQL

Data Engineering Principles

Python Programming – Fundamentals

Big Data Overview

Getting started with Apache Spark

Working with Spark SQL

Apache Spark Data Structures – RDD

Apache Spark Data Structures – Data frames

Apache Spark Data Structures – Data frames (Advanced)

Machine Learning Overview

Spark Machine Learning Library

Spark Streaming Library

NoSQL

Document Datastore: MongoDB

DevOps Toolkit

Cloud Computing Foundations (AWS or GCP)

Project & Use Case

Certification

Training material provided:

raju2006

Looking for Team Training?

Public Classes

Get Started

Related Courses

Python for Beginners

Byte-Sized ML Series: Classification Algorithms

Distributed Architecture Design using No-SQL Databases

Artificial Intelligence And Machine Learning For Non-Program

Introduction to Java

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Modal title