Big Data Essentials

Course

Intro Video

Photo of Myles Young

Myles Young

BigData Training Architect II in Content

I am a father and husband with a passion for tech. I have large-scale enterprise IT experience in network security, agile development, middleware, QA, system reliability engineering, and data infrastructure engineering. I have worked in DevOps for most of my IT career with a focus on using automation and big data technologies for operational analytics and log aggregation to further support CI/CD pipelines. I have a great appreciation for distributed systems and finding non-obvious answers in mountains of data. I am excited to be working at Linux Academy where I get to share what I've learned with our awesome students!

Length

03:32:37

Difficulty

Beginner

Course Details

Big Data Essentials is a comprehensive introduction to the world of Big Data. Starting with the definition of Big Data, we describe the various characteristics of Big Data and its sources. Using real world examples, we highlight the growing importance of Big Data. We discuss architectural requirements and principles of Big Data infrastructures and the intersection of cloud computing with Big Data. We also provide an overview of the most popular Big Data technologies including core Hadoop, the Hadoop ecosystem (Hive, Pig, Sqoop, Flume, Kafka, Storm, Ambari, Oozie, Zookeeper), NoSQL databases and Apache Spark. We conclude this lesson with a tour of the different types of Analytics that can be performed on Big Data and various techniques and tools used.

Syllabus

About The Course

About The Course

00:03:47

Lesson Description:

This video describes the course outline and its contents along with information on course audience and pre-requisites.

Introduction to Big Data

What Is Big Data?

00:22:43

Lesson Description:

This lesson gives you an introduction to the world of Big Data, starting with its definition. We describe the evolution of technologies leading to the Big Data era, the different sources and formats of Big Data including concrete examples. We conclude this lesson by describing the types of analytics common with Big Data. 

What Is IoT?

00:03:55

Lesson Description:

In this lesson, we briefly describe what IOT is why IOT and Big Data are strongly interconnected.

Big Data Explained with Use Cases

00:16:06

Lesson Description:

This lesson highlights the applications and usage of Big Data across industries including specific examples of implementations by organizations like UPS, Disney World, John Deere and the city of Barcelona.

Big Data Trends

00:16:09

Lesson Description:

In this lesson we discuss some of the trends seen in the industry as prompted by the adoption of Big Data. We walk through Forrester's report on Big Data growth areas. We conclude the lesson by highlighting some of the dangers of Big Data as Big Data can be cool but also frightening.

Introduction to Big Data

Big Data Architectures and Models

Big Data Architectures

00:29:47

Lesson Description:

In this lesson, we discuss the cyle of Big Data Management and the essential components in a Big Data infrastructure. We touch on the different layers in the architecture and explain theoretical principles of each.

Big Data in the Cloud

00:18:47

Lesson Description:

In this lesson, we summarize important concepts of cloud computing and describe how the requirements of a Big Data architecture are solved by characteristics of cloud services. We further provide a quick summary of various Big Data related services offered by cloud leaders Amazon, Google and Microsoft.

Big Data Architectures and Components

Big Data Tools and Technologies

Overview of Apache Hadoop

00:03:35

Lesson Description:

This lesson introduces the Apache Hadoop framework and its components.

HDFS

00:06:51

Lesson Description:

This lesson describes the concept and architecture of HDFS in the Apache Hadoop framework.

MapReduce

00:11:29

Lesson Description:

This lesson describes the concept of Hadoop MapReduce in the Apache Hadoop framework.

YARN

00:05:11

Lesson Description:

This lesson describes the architectural concepts of YARN in the Apache Hadoop framework.

Overview of the Hadoop Ecosystem

00:06:13

Lesson Description:

In this lesson, we learn about the components and services that come together to create a successful Hadoop solution.

Hive, Pig and MapReduce

00:07:21

Lesson Description:

This lesson explains concepts and usage scenarios of Hive and Pig and compares them with MapReduce. 

Sqoop, Flume, Kafka and Storm

00:14:12

Lesson Description:

This lesson describes Sqoop, Flume, Kafa and Storm in brief and explains the usage of each in a Hadoop data pipeline.

Ambari, Oozie and Zookeeper

00:06:25

Lesson Description:

This lesson describes the tools in the Hadoop ecosystem that are used for data adminsitration. We discuss usage and design concepts in Ambari, Oozie and Zookeeper.

Overview of NoSQL Databases

00:15:18

Lesson Description:

This lesson explains challenges of relational databases and benefits of NoSQL over relational databases. It describes the CAP theorem, ACID properties and the concept of "eventual consistency". We explain briefly the 4 categories of NoSQL databases with examples.

Overview of Apache Spark

00:09:47

Lesson Description:

This lesson introduces salient features of Apache Spark and its history. It also explains in brief the difference between Hadoop and Spark.

Hadoop and its ecosystem

NoSQL

Analytics

Analyzing Big Data

00:14:53

Lesson Description:

This lesson starts with describing the categories of analytics and delves deeper into each category detailing on the tools and techniques used.

Analyzing Big Data