ETL Processing on Google Cloud Using Dataflow and BigQuery
- Created By raju2006
- Last Updated December 29th, 2023
- Overview
- Prerequisites
- Audience
- Audience
- Curriculum
Overview:
Gain practical expertise in constructing ETL data pipelines on Google Cloud through hands-on experience. This course employs a blend of informative lectures, live demonstrations, and interactive labs to guide you through the intricacies of designing data processing systems and constructing comprehensive end-to-end data pipelines
Course Code/Duration:
BDT105 / 1 Day
Learning Objectives:
- Using BigQuery to load and transform the data:
- Perform a one-time data load into BigQuery for analysis.
- Prototype datasets to validate and prepare for automation.
- Understand how to use BigQuery effectively for data loading and transformation.
- Using Dataflow to load, transform, and cleanse the data:
- Learn to load larger datasets with Dataflow.
- Integrate data from multiple sources into a unified pipeline.
- Gain the ability to load data incrementally and automatically.
- Understand the fundamentals of data cleansing and transformation using Dataflow.
To get the most of out of this course, participants should have:
- Completed Google Cloud Fundamentals- Big Data and Machine Learning course OR have equivalent experience.
- Basic proficiency with common query language such as SQL Experience with data modelling, extract, transform, load activities.
- Developing applications using a common programming language such as Python Familiarity with basic statistics.
This class is intended for data engineer who are responsible for managing big data transformations including:
- Extracting, loading, transforming, cleaning, and validating data.
- Designing pipelines and architectures for data processing.
This class is intended for data engineer who are responsible for managing big data transformations including:
- Extracting, loading, transforming, cleaning, and validating data.
- Designing pipelines and architectures for data processing.
Course Outline:
The course includes presentations, demonstrations, and hands-on labs.
Module 1: Introduction to Data Engineering
- Explore the role of a data engineer.
- Analyze data engineering challenges.
- Intro to BigQuery.
- Data Lakes and Data Warehouses.
- Demo: Federated Queries with BigQuery.
Module 2: Building a Data Warehouse
- The modern data warehouse.
- Intro to BigQuery.
- Demo: Query TB+ of data in seconds.
- Getting Started.
- Loading Data.
- Video Demo: Querying Cloud SQL from BigQuery.
- Lab: Loading Data into BigQuery.
Module 3: Introduction to Building Batch Data Pipelines
- EL, ELT, ETL.
- Quality considerations.
- How to carry out operations in BigQuery.
Module 4: Serverless Data Processing with Cloud Dataflow
- Cloud Dataflow.
- Why customers value Dataflow.
- Dataflow Pipelines.
- Lab: A Simple Dataflow Pipeline (Python/Java).
- Lab: MapReduce in Dataflow (Python/Java).
- Lab: Side Inputs (Python/Java).
- Dataflow Templates.
- Dataflow SQL.
Tutorials
- Using BigQuery to load and transform the data. Use this approach to perform a one-time load of a small amount of data into BigQuery for analysis. You might also use this approach to prototype your dataset before you automate larger or multiple datasets.
- Using Dataflow to load, transform, and cleanse the data. Use this approach to load a larger amount of data, load data from multiple data sources, or to load data incrementally or automatically.
Training material provided:
Yes (Digital format)
The curriculum is empty
[INSERT_ELEMENTOR id="19900"]