Apache Spark Essentials

Course Instructor
course instructor image
Manisha Sule
Manisha Sule is the Big Data Analytics instructor at Linux Academy. Prior experience includes working at IBM's Spark Technology Center and IBM Analytics. She has worked with customers, data scientists and data engineers to educate and enable them on Big Data technologies like Apache Spark. She has a Masters in Computer Science and more than a decade worth's experience in building highly distributed and highly available software solutions.

Introduction to Spark

About This Course

00:04:39

Introduction to Spark

00:16:53

Uses Cases of Spark

00:14:27

Quiz: Introduction to Apache Spark

Architecture and Components of Spark

Architecture and Components of Spark

00:13:19

Resilient Distributed Datasets

00:16:03

RDD Operations

00:13:16

Key Value Pair RDDs

00:10:30

Quiz: Architecture and Components of Spark

Quiz: Resilient Distributed Datasets

Installing and Getting Started with Spark

Outline of the Install Process

00:01:37

Installing Java for Ubuntu

00:01:52

Installing Java for CentOS

00:03:05

Installing Spark

00:04:32

Using the Spark Install

00:04:59

Exercise: Install Java on Ubuntu

00:30:00

Exercise: Install Java on CentOS

00:30:00

Exercise: Install Apache Spark

00:30:00

Exercise: Spark Programming with Python REPL

00:30:00

Exercise: Spark Programming with Scala REPL

00:30:00

Spark Ecosystem

Spark MLlib

00:17:45

Demo: Spark MLlib

00:07:37

Spark SQL

00:08:04

Spark Streaming

00:08:03

Spark GraphX

00:04:47

Spark R

00:02:48

Quiz: Spark MLlib

Exercise: Spark MLlib

00:30:00

Exercise: Spark GraphX

00:30:00

Exercise: Spark Streaming

00:30:00

Exercise: Spark SQL

00:30:00

Details

This course provides a comprehensive introduction to Apache Spark. Starting with an overview of Apache Spark and its usage in the Big Data Analytics industry we talk about some of the famous use cases of Apache Spark and its rise in popularity in the industry. The course gives an in-depth explanation of the core concepts of Spark, including runtime architecture and the primary data abstraction of RDDs. The course also guides you through the installation process of Spark and helps you get started with some examples in Spark's command line interface using Python and Scala. The course then gives brief description of Spark's libraries including Spark SQL, Spark MLlib, Spark streaming, Spark GraphX and SparkR. We provide hands-on exercises to practice basic programming in Spark that allows you to further explore the Spark programming API.

Study Guides

Introduction to Apache Spark

This study guide provides a comprehensive introduction to Apache Spark. Starting with an overview of Apache Spark and its usage in the Big Data Analytics industry we talk about some of the famous use cases of Apache Spark and its rise in popularity in the industry. The course gives an in-depth explanation of the core concepts of Spark, including runtime architecture and the primary data abstraction of RDDs. The course also guides you through the installation process of Spark and helps you get started with some examples in Spark's command line interface using Python and Scala. The course then gives brief description of Spark's libraries inlcuding Spark SQL, Spark MLlib, Spark streaming, Spark GraphX and SparkR. We provide hands-on exercises to practice basic programming in Spark that allows you to further explore the Spark programming API.

Instructor Deck

Community

Looking For Team Training?

Learn More