“Apache Spark 2.0 Basics” workshop-style course equips students with the core skills needed for hands on big data analysis with Apache Spark through live, in-person classes. After successfully completing this course, students will be able to analyze massive data sets across a Hadoop Cluster and be prepared to take the next course in our Big Data Engineering Track.

What will I learn?
1) Learn the concepts of Spark’s Resilient Distributed Data Sets (RDDs)
2) Develop distributed code using the Scala programming language
3) Develop and run Spark jobs using Scala
4) Understand how Hadoop YARN distributes Spark across computing clusters
5) Optimize Spark jobs through partitioning, caching, and other techniques
6) Build, deploy, and run Spark scripts on Amazon’s Elastic MapReduce (EMR) service for big data analysis on larger data sets
7) Learn other Spark technologies like Spark SQL, DataFrames, DataSets and MLLib

What are the requirements?
1) Some prior programming or scripting experience is required. We have a crash course in Scala included in this course, but you need to know the fundamentals of programming in order to pick it up.
2) You will need a PC or a Mac OS to work on this course.
3) You will need to create and manage your personal AWS account as part of the course.

Can I keep working while taking this course?
Yes, this course is designed for working professionals.

How is the course structured?
This is a workshop-style course, so you learn by doing hands-on coding. Additionally, take-home assignments will be provided every week to reinforce the concepts taught in the class. Take-home assignments will be submitted via GitHub Repository. Students can use Q&A forum to get their questions answered and share their knowledge when the class is not in session.

What credentials does the instructor have?
Our instructor has over 20 years of software development experience and currently consulting in Big Data Analytics and Machine Learning. https://www.linkedin.com/in/sivagami-ramiah-2347bb7/

Course Curriculum:
1) Introduction to Hadoop
2) Scala Basics
3) Intro to Spark
4) Resilient Distributed Data Sets
5) Spark Architecture Internals
6) Running Spark on a Hadoop Cluster
7) Spark SQL, Data Frames and Data Sets
8) Machine Learning with MLLib
9) Advanced Spark Examples


ByteQuest is a Big Data and Machine Learning Training institution helping teach the next generation of Data Engineers and Data Scientists.