Instructor

raju2006

Data Science with Hadoop and Spark

10 weeks

All levels

0 lessons

0 quizzes

0 students

Data Science with Hadoop and Spark

Created By raju2006
Last Updated February 16th, 2025

Overview
Prerequisites
Audience
Curriculum

Description:

This course will teach participants how to use Apache Hadoop and Apache Spark to solve sophisticated data science problems, producing valuable insights in a wide range of scenarios.

Day one focuses on data science basics, including data acquisition, scrubbing and manipulation, as well as a general overview of data science applications as well as the analytics and machine learning processes typically employed. A number of practical use cases are examined during class and lab sessions.

Day two focuses on Apache Hadoop and its ecosystem along with the types of data science applications typically handled by the Hadoop platform. The course outlines the statistical methods used to produce actionable business insights with MapReduce, Python, Hive and other tools.

Day three begins with an overview of the Apache Spark platform and its machine learning library, MLlib.

Participants will learn how to perform entity ranking, implement recommendation engines and perform other common data science tasks using Spark batch, streaming, graph and machine learning capabilities.

Course Code/Duration:

BDT62 / 3 Days

Learning Objectives:

In this course, participants will:

Have a clear understanding of data science, its typical use cases and how data science is performed using a range of tools in the Apache open source ecosystem.

Python Programming Basics. Each participant will require the ability to run a 64 bit virtual machine (provided with the course).

This course is designed for Application developers, analysts and data scientists.

Course Outline:

Day 1

Data Science
- Data Science Process Overview
- Structured and Unstructured Data
- Data Acquisition and Transformation
- Data Analysis and Machine Learning
- Machine Learning Concepts

Day 2

Big Data overview
- A brief history of Big Data
- History and background of Big Data and Hadoop
- 5 V’s of Big Data
- Secret Sauce of Big Data Hadoop
- Big Data Distributions in Industry
- End-to-End Big Data Life cycle overview
- Demos and Labs
Big Data Ecosystem before Spark
- Big Data Ecosystem before Apache Spark
- Storage options – HDFS and No-SQL
- Processing options – MapReduce, Hive etc.
- Administrative tools – Zookeeper, Ozzie etc.
- Ingestion tools – Sqoop, Flume
- Demos and Labs

Day 3

Getting Started with Apache Spark
- Introduction to Spark RDD
- Spark RDD Transformation and Actions
- Spark Lifecycle
- Spark Caching
- Setup Account on Apache Spark Databricks Cloud
- Databricks Notebooks overview
- Lab – Spark RDD Transformation & Actions
- Lab – Spark RDD Advanced Transformation & Actions
- Demos and Labs
Apache Spark SQL, DataFrames, Datasets
- Introduction to Spark SQL
- SQL, DataFrames and Datasets Spark Library
- Compare the various APIs – RDD, DataFrames and Datasets
- Demos and Labs
Machine Learning using Apache Spark
- Introduction to Machine Learning and Data Science
- Machine Learning Spark Library
- Spark Machine Learning examples
- Demos and Labs
- Streaming using Apache Spark
- Need of real time processing
- Streaming Spark Library
- Spark Streaming examples
- Demos and Labs

Training material provided:

Yes (Digital format)

The curriculum is empty

raju2006

242 Courses

0.0 Avg Review

[INSERT_ELEMENTOR id="19900"]

Looking for Team Training?

Up-skill your team with a customized, private training

Public Classes

Suitable for small teams and individuals

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Data Science with Hadoop and Spark

Data Science with Hadoop and Spark

Description:

Course Code/Duration:

Learning Objectives:

Course Outline:

Day 1

Day 2

Day 3

Training material provided:

raju2006

Looking for Team Training?

Public Classes

Get Started

Byte-Sized Agile Series: Writing Great User Stories

AI Byte-Sized Series: Linear Regression Model

DevOps Toolkit: Git, Docker, Kubernetes, and CI/CD

Byte-Sized ML Series: Data Visualization

Google Cloud Big Data and Machine Learning Fundamentals

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Achieve your goals

Achieve your goals

transform your life through education

Achieve your goals

Achieve your goals

transform your life through education

Data Science with Hadoop and Spark

Data Science with Hadoop and Spark

Description:

Course Code/Duration:

Learning Objectives:

Course Outline:

Day 1

Day 2

Day 3

Training material provided:

raju2006

Looking for Team Training?

Public Classes

Get Started

Related Courses

Byte-Sized Agile Series: Writing Great User Stories

AI Byte-Sized Series: Linear Regression Model

DevOps Toolkit: Git, Docker, Kubernetes, and CI/CD

Byte-Sized ML Series: Data Visualization

Google Cloud Big Data and Machine Learning Fundamentals

Headquarters

Quick Links

resources

About Us

Newsletter

follow us

Modal title