Hadoop Quick Start

Course

Intro Video

Photo of Myles Young

Myles Young

BigData Training Architect II in Content

I am a father and husband with a passion for tech. I have large-scale enterprise IT experience in network security, agile development, middleware, QA, system reliability engineering, and data infrastructure engineering. I have worked in DevOps for most of my IT career with a focus on using automation and big data technologies for operational analytics and log aggregation to further support CI/CD pipelines. I have a great appreciation for distributed systems and finding non-obvious answers in mountains of data. I am excited to be working at Linux Academy where I get to share what I've learned with our awesome students!

Length

02:35:25

Difficulty

Beginner

Course Details

Hadoop has become a staple technology in the big data industry by enabling the storage and analysis of datasets so big that it would be otherwise impossible with traditional data systems. In this course, we are going to jump right into deploying Hadoop, configuring HDFS, and executing MapReduce jobs. Lastly, you will get to try it all out yourself with a guided hands-on learning activity. So let's get started!

Download the Hadoop Quick Start Interactive Guide here: https://interactive.linuxacademy.com/diagrams/HadoopQuickStart.html

Syllabus

Introduction

Getting Started

Course Overview

00:02:16

Lesson Description:

Welcome to the Hadoop Quick Start course! Let's take a quick look at what you can expect from this course and get a better idea of who this course is best suited for and what prior knowledge you should have.

Install Java

00:03:35

Lesson Description:

Before we can start Hadoop, we need to install Java. Follow along with me using your Linux Academy cloud playground server as we perform the following tasks: Search for a suitable Java package.Install Java-1.8.0.Verify our Java installation.

Configure Passwordless SSH

00:02:02

Lesson Description:

In order for the namenode and datanode to communicate with each other, we need to configure passwordless SSH. Follow along with me using your Linux Academy cloud playground server as we perform the following tasks: Generate an RSA keys for cloud_user.Add the public key to the authorized keys list.Test passwordless SSH on the localhost.

Hadoop

Installation and Configuration

Download and Deploy Hadoop

00:05:14

Lesson Description:

For this course, we are going to install Hadoop as a pseudo-distributed single-node cluster via an archive downloaded from an Apache mirror. Follow along with me using your Linux Academy cloud playground server as we perform the following tasks: Determine the recommended mirror.Download Hadoop to the cloud_user home directory.

Configure Hadoop

00:01:53

Lesson Description:

In order to start Hadoop services, we need to tell Hadoop where our JAVA_HOME is located. Follow along with me using your Linux Academy cloud playground server as we perform the following tasks: Determine where Java is installed.Identify what directory Hadoop expects as the JAVA_HOME.Configure the JAVA_HOME in Hadoop's environment script.

Configure HDFS

00:03:30

Lesson Description:

Before we can use our HDFS cluster to run MapReduce jobs, we first need to configure it to connect via SSH, replicate data with a factor of 1, and format the filesystem. Follow along with me using your Linux Academy cloud playground server as we perform the following tasks: Configure the default filesystem for Hadoop.Configure the DFS replication factor.Format the HDFS.

Execution

Start Services

00:01:12

Lesson Description:

Now that we have our cluster installed and configured, let's start things up and make sure our HDFS cluster is working as intended. Follow along with me using your Linux Academy cloud playground server as we perform the following tasks: Start the namenode and datanode.Verify that HDFS is accessible and accepting commands.

Configure MapReduce

00:05:54

Lesson Description:

Let's prepare our HDFS for running MapReduce jobs by creating the user directory and staging a sample file which we can analyze. Follow along with me using your Linux Academy cloud playground server as we perform the following tasks: Create the /user directory.Stage the LICENSE.txt file in HDFS for analysis.

Run a MapReduce Job

00:05:34

Lesson Description:

Now we are prepared to execute some MapReduce jobs on our sample data. Follow along with me using your Linux Academy cloud playground server as we perform the following tasks: Execute a wordmean analysis on the LICENSE.txt file with MapReduce.Execute a Pi calculation with MapReduce.View MapReduce job output files.Navigate the MapReduce application usage information.

Stop Services

00:01:05

Lesson Description:

With our analysis complete, let's show how to shut down our distributed filesystem. by stopping the HDFS services (namenode and datanode).

Conclusion

Final Steps

Whats Next?

00:03:04

Lesson Description:

First off, congratulations on completing this course! Now let's take a look at some logical next steps for training opportunities here at Linux Academy.

Hands-on Labs are real live environments that put you in a real scenario to practice what you have learned without any other extra charge or account to manage.

02:00:00