Learning Objective:

This course covers the concepts from the latest versions of the most sought after Open Source Big Data Frameworks for Real-Time Data Analytics. Upon successful completion of this Big Data Engineering Boot Camp, attendees will be able to:

Build Real-Time Distributed Data Pipelines with Spark, Kafka and Cassandra at Scale.

Audience: Software Developers/Engineers, Technical Leads, Architects, Software Engineering Managers and Big Data Enthusiasts


Comfortable with Java programming language

Familiarity of Linux environment (SSH into a Linux box, navigate the directory system, find out what processes are running, know how to move things to the box, edit files on the box, and understand the basics of user groups and permission)

Hands-on Experience: Our Immersive Boot Camp comprises of 50% lab work. All the required Lab Activities, Programming Assignments and the Capstone Project will be done in a Cloud Environment.


Primary Instructor: Sivagami Ramiah
AWS Instructor: Sentha Karuppaiah


Detailed Course Curriculum

Course Curriculum:

Section 1: Hadoop Version 2.x

Introduction to Big Data
What is Hadoop?
The Hadoop Distributed File System (HDFS)
MapReduce (Data Processing Framework)
YARN (Cluster Resource Manager)
Hive (Data Warehouse Framework)

Section 2: Spark Version 2.x

What’s Spark?
Spark Architecture
Spark EcoSystem
Just a Little Scala (Version 2.x) for Spark
Spark Architecture Internals
DataFrames, DataSets and Spark SQL
Spark’s MLlib API for Machine Learning
Graph Processing with GraphFrames
Spark Structured Streaming
Advanced Spark Programming
Spark in Production

Section 3: Kafka Version 0.10.x

Introduction to Messaging Systems
Introduction to Kafka
Kafka Architecture
Kafka Producers
Kafka Consumers
Kafka Architecture Internals
Building Data Pipelines
Kafka in Production

Section 4: Cassandra Version 3.x

Beyond Relational Databases
Introduction to Cassandra & NoSQL
Cassandra Architecture
Cassandra Architecture Internals
Introduction to CQL
Cassandra Data Modeling

Section 5: Individual Capstone Project

Building End to End Real-Time Distributed Data Pipeline in Cloud Environment

Training Location

7100 Stevenson Blvd. Fremont, CA 94538

Apply Now



Course Reviews


4 ratings
  • 5 stars0
  • 4 stars0
  • 3 stars0
  • 2 stars0
  • 1 stars0

No Reviews found for this course.


ByteQuest is a Big Data and Machine Learning Training institution helping teach the next generation of Data Engineers and Data Scientists.