Getting Started with Apache Spark using Databricks
- Created By ebrahim khaja
- Last Updated December 27th, 2023
- Overview
- Prerequisites
- Audience
- Audience
- Curriculum
Description:
Jumpstart your data journey with our 'Getting Started With Apache Spark Using Databricks' training. This course empowers participants to tackle complex data challenges, harnessing the potential of Apache Hadoop and Apache Spark to uncover valuable insights across various domains.
In today's data-driven world, Big Data has become the driving force behind intelligent enterprise software. Companies worldwide are adopting Big Data solutions to manage the vast and high-velocity data streams efficiently.
For software architects and engineers, this course offers a practical, hands-on experience with a blend of lectures, demonstrations, and interactive labs, ensuring a comprehensive understanding of Big Data and Apache Spark's advanced applications. Start your data transformation journey today.
Duration: 4 Days
Course Code: BDT97
Learning Objectives:
After this course, you will be able to:
- Have a broad understanding of Big Data Ecosystem.
- Understand the various offerings like Cloudera, Hortonworks, MapR, Amazon EMR and Microsoft Azure HDInsight in the industry around Big data on cloud and on Premise.
- Understand the impact and value of Apache Spark in the Big Data Ecosystem.
- Understand the Apache Spark Architecture and the various libraries to perform various use cases like SQL, Streaming, Machine Learning, Graphix/Graph Frames, etc.
- Setup Account on Apache Spark Databricks Cloud.
- Perform hands-on activity on Big Data Ecosystem.
- Experience of programming language like Python required.
- SQL and Data knowledge
- Familiarity with Big data is a plus
- This course is designed for Data Analysts, Software Engineers, Data Engineer, Data Professional, Business Intelligence Developer, Data Architect, DevOps Engineer
- This course is designed for Data Analysts, Software Engineers, Data Engineer, Data Professional, Business Intelligence Developer, Data Architect, DevOps Engineer
Course Outline
Day 1: -
Big Data overview
- A brief history of Big Data
- History and background of Big Data and Hadoop
- 5 V’s of Big Data
- Secret Sauce of Big Data Hadoop
- Big Data Distributions in Industry
- End-to-End Big Data Life cycle overview
- Industry Use cases
Big Data Ecosystem before Spark
- Big Data Ecosystem before Apache Spark
- Storage options – HDFS and No-SQL
- Processing options – MapReduce, Hive etc.
- Administrative tools – Zookeeper, Ozzie etc.
- Ingestion tools – Sqoop, Flume
Big Data Ecosystem after Spark
- Big Data Ecosystem after Apache Spark
- Compare MapReduce Vs Apache Spark
- Apache Spark Architecture
- Understand Apache Architecture and Libraries like Streaming, Machine Learning with Spark ML, GraphX/GraphFrames, etc.
- Understanding Spark RDD
- Setup Account on Apache Spark Databricks Cloud.
- Introduction to Notebooks concept on Databricks
- Demos and Labs
Days 2: -
Getting Started with Apache Spark
- Introduction to Spark RDD
- Spark RDD Transformation and Actions
- Spark Lifecycle
- Spark Caching
- Lab - Spark RDD Transformation & Actions
- Lab - Spark RDD Advanced Transformation & Actions
- Demos and Labs
Apache Spark SQL, DataFrames, Datasets
- Introduction to Spark SQL
- SQL, DataFrames and Datasets Spark Library
- Compare the various APIs - RDD, DataFrames and Datasets
- Lab - Spark DataFrames Transformation & Actions
- Lab - Spark DataFrames Advanced Transformation & Actions
- Demos and Labs
Days3: -
Data Science Overview
- Data Science Process Overview
- Structured and Unstructured Data
- Data Acquisition and Transformation
- Data Analysis and Machine Learning
- Machine Learning Concepts
Machine Learning Overview using Apache Spark
- Introduction to Machine Learning and Data Science
- Machine Learning Spark Library
- Spark Machine Learning – Classification, Regression
- Machine Learning Model building with Spark ML Library
- Demos and Labs
Days4: -
Structured Streaming Overview using Apache Spark
- Need of real time processing
- Streaming Spark Library
- Streaming Query
- Processing and Aggregating Streams
- Data Lake concept
- Spark Streaming examples
- Demos and Labs
Graphix/Graph Frames Overview using Apache Spark
- Need of Graphix/Graph Frames
- Spark Graphx & GraphFrames Library
- Spark Graphx & GraphFrames examples
- Demos and Labs
Training material provided: Yes (Digital format)