Apr 28, 2024  
2018-2019 Course Catalog 
    
2018-2019 Course Catalog [ARCHIVED CATALOG]

Add to Portfolio (opens a new window)

CVF 2071 - Hadoop Administration

Credits: 3
Hours/Week: Lecture 2 Lab 1
Course Description: This course builds on topics in CVF 1071  , Introduction to Big Data Analytics and Security.  It provides students with a comprehensive introduction to the steps necessary to install, configure, operate, and maintain Hadoop.  The course begins with an overview of the Big Data landscape and then dives into a system administration working view of running Hadoop.  Students will also have the opportunity to install Splunk on top of Hadoop and examine how to process and analyze the data using Splunk’s Search Processing Language (SPL) as an implementation of MapReduce function.  This course employs both “open source technology” (Hadoop) and “commercial technology” (Splunk).
MnTC Goals
None

Prerequisite(s): CVF 1071   and CVF 1205  or CSCI 1060  with grades of C or higher, or instructor consent.
Corequisite(s): None
Recommendation: None

Major Content

  1. Introduction to Hadoop
    1. History of Hadoop
    2. Core Components of Hadoop
    3. Fundamental Concepts of Hadoop

2. Planning Hadoop Cluster

  1. Basic Planning Considerations
  2. Choosing Hardware
  3. Network Considerations
  4. Nodes Configuring
  5. Planning for Cluster Management

3. Hadoop Distributed File System 

  1. HDFS Features
  2. Reading and Writing Files
  3. NameNode Considerations
  4. HDFS Security
  5. Namenode Web User Interface
  6. Hadoop File Shell

4. Getting Data into HDFS

  1. Pulling data from External Sources with Flume
  2. Using Sqoop to import Data from Relational Databases
  3. Best Practices
  4. REST Interfaces

5. MapReduce

  1. Architectural Overview
  2. MapReduce overview
  3. Features of MapReduce
  4. YARN MapReduce Version 2
  5. Failure Recovery
  6. The JobTracker Web User Interface

6. Installation, Initialization, and Configuration of Hadoop

  1. Configuration and Deployment Types
  2. Installing Hadoop
  3. Specifying the Hadoop Configuration
  4. Initial HDFS and MapReduce Configuration
  5. Log Files

7. Installing/Configuring

  1. Hive
  2. Impala
  3. Pig

8. Hadoop Clients

  1. What is Hadoop Client?
  2. Installing and Configuring Hadoop Clients
  3. Installation and Configuration of Hue
  4. Authentication and Configuration of Hue

9. Hadoop Advanced Cluster Configuration

  1. Advanced Configuration Parameters
  2. Configuring Hadoop Ports
  3. Configuring HDFS for Rack
  4. Awareness & HDFS High Availability
  5. Explicitly Including and Excluding Hosts

10. Hadoop Security

  1. Importance of Hadoop Security
  2. Hadoop’s Security System Concepts
  3. What Kerberos Is and How it Works
  4. Using Kerberos to Secure a Hadoop Cluster

11. Scheduling and Managing Jobs

  1. Scheduling Hadoop Jobs
  2. Managing Running Jobs
  3. Configuring the FairScheduler

12. Cluster Maintenance

  1. Checking HDFS Status
  2. Copying Data Between Clusters
  3. Removing /Adding Cluster Nodes
  4. Rebalancing of Cluster
  5. NameNode Metadata Backup
  6. Cluster Upgrades

13. Monitoring and Troubleshooting Cluster

  1. General System Monitoring
  2. Clusters Monitoring
  3. Managing Hadoop’s Log Files
  4. Common Troubleshooting Issues

 
Learning Outcomes
At the end of this course, students will be able to:

  1. describe history of Hadoop.
  2. describe the fundamental concepts of using Big Data.
  3. identify where Hadoop fits into a Big Data strategy.
  4. design a plan to create Hadoop cluster.
  5. explain HDFS features and NameNode.
  6. demonstrate how to get data into HDFS
  7. explain how to work with MapReduce
  8. implement installation and configuration of Hadoop.
  9. install and Configure Hadoop Clients.
  10. configure HDFS for Rack Awareness & HDFS High Availability
  11. administer cluster maintenance.
  12. schedule Hadoop’s job
  13. describe Hadoop cluster maintenance.
  14. monitor and troubleshoot Hadoop cluster
  15. identify common integration points.
  16. explain Hadoop Security.

Competency 1 (1-6)
None
Competency 2 (7-10)
None


Courses and Registration



Add to Portfolio (opens a new window)