- Overview
- Prerequisite
- Audience
- Audience
- Curriculum
Description:
The 'Introduction to Big Data' course is your gateway to the dynamic world of Big Data and Spark. Dive into the history and fundamentals of Big Data, gaining insights into Big Data Ecosystem technologies, including HDFS, MapReduce, Sqoop, Flume, Hive, Pig, Mahout for Machine Learning, R Connector, Ambari, Zookeeper, Oozie, and No-SQL tools like HBase. This course offers an in-depth understanding of the Big Data ecosystem, both pre and post Apache Spark era. Learn the core fundamentals and architecture of Spark and put your knowledge into practice on the Apache Spark Databricks Cloud. Get started on your Big Data journey.
Course Code/Duration:
BDT132 / 1 Day
Learning Objectives:
In this course, participants will:
- Understand the History and background of Big data and Hadoop
- Describe the Big Data landscape including examples of real world big data problems
- Explain the 5 V’s of Big Data (volume, velocity, variety, veracity, and value)
- Understand the foundational principles that have made Big Data so successful
- Provide an explanation of the ecosystem components like HDFS, MapReduce, Sqoop, Flume, Hive, Pig, Mahout (Machine Learning), R Connector, Ambari, Zookeeper, Oozie and No-SQL like HBase
- Understand the various offerings like Cloudera, Hortonworks, MapR, Amazon
- EMR and Microsoft Azure HDInsight in the industry around Big data on cloud and on Premise
- Understand the impact and value of Apache Spark in the Big Data Ecosystem
- Understand the Apache Spark Architecture and the various libraries to perform various use cases like Streaming, Machine & Deep Learning, GraphX etc
- Setup Account on Apache Spark Databricks Cloud
- Perform hands-on activity on Big Data Ecosystem.
- Basic Programming knowledge, SQL and Data knowledge preferred
- This course is designed for anyone willing to develop a foundation for Big Data.
- This course is designed for anyone willing to develop a foundation for Big Data.
Course Outline:
The course includes presentations, demonstrations, and hands-on labs.
- Course Introduction
- History and background of Big Data and Hadoop
- 5 V’s of Big Data
- Secret Sauce of Big Data Hadoop
- Big Data Distributions in Industry
- Big Data Ecosystem before Apache Spark
- Big Data Ecosystem after Apache Spark
- Comparison of MapReduce Vs Apache Spark
- Big Data Ecosystem after Apache Spark
- Understand Apache Architecture and Libraries like Streaming, Machine & Deep Learning, GraphX etc
- Hands-on exercise 1 – Setup Account on Apache Spark Databricks Cloud.
- Hands-on exercise 2 – First Spark Program
- Hands-on exercise 3 – Spark RDD Transformation & Actions
- Hands-on exercise 4 – Spark RDD Advanced Transformation & Actions
- References and Next steps
- Structured Activity/Exercises/Case Studies:
- Exercise 1 – Setup Account on Apache Spark Databricks Cloud.
- Exercise 2 – First Spark Program
- Exercise 3 – Spark RDD Transformation & Actions
- Exercise 4 – Spark RDD Advanced Transformation & Actions
Training material provided:
Yes (Digital format)