CVF 2071 - Hadoop Administration
Hours/Week: Lecture 2 Lab 1
Course Description: This course builds on topics in CVF 1071 , Introduction to Big Data Analytics and Security. It provides students with a comprehensive introduction to the steps necessary to install, configure, operate, and maintain Hadoop. The course begins with an overview of the Big Data landscape and then dives into a system administration working view of running Hadoop. Students will also have the opportunity to install Splunk on top of Hadoop and examine how to process and analyze the data using Splunk’s Search Processing Language (SPL) as an implementation of MapReduce function. This course employs both “open source technology” (Hadoop) and “commercial technology” (Splunk).
Prerequisite(s): CVF 1071 and CVF 1205 with grades of C or higher, or instructor consent.
- Introduction to Hadoop
- History of Hadoop
- Core Components of Hadoop
- Fundamental Concepts of Hadoop
2. Planning Hadoop Cluster
- Basic Planning Considerations
- Choosing Hardware
- Network Considerations
- Nodes Configuring
- Planning for Cluster Management
3. Hadoop Distributed File System
- HDFS Features
- Reading and Writing Files
- NameNode Considerations
- HDFS Security
- Namenode Web User Interface
- Hadoop File Shell
4. Getting Data into HDFS
- Pulling data from External Sources with Flume
- Using Sqoop to import Data from Relational Databases
- Best Practices
- REST Interfaces
- Architectural Overview
- MapReduce overview
- Features of MapReduce
- YARN MapReduce Version 2
- Failure Recovery
- The JobTracker Web User Interface
6. Installation, Initialization, and Configuration of Hadoop
- Configuration and Deployment Types
- Installing Hadoop
- Specifying the Hadoop Configuration
- Initial HDFS and MapReduce Configuration
- Log Files
8. Hadoop Clients
- What is Hadoop Client?
- Installing and Configuring Hadoop Clients
- Installation and Configuration of Hue
- Authentication and Configuration of Hue
9. Hadoop Advanced Cluster Configuration
- Advanced Configuration Parameters
- Configuring Hadoop Ports
- Configuring HDFS for Rack
- Awareness & HDFS High Availability
- Explicitly Including and Excluding Hosts
10. Hadoop Security
- Importance of Hadoop Security
- Hadoop’s Security System Concepts
- What Kerberos Is and How it Works
- Using Kerberos to Secure a Hadoop Cluster
11. Scheduling and Managing Jobs
- Scheduling Hadoop Jobs
- Managing Running Jobs
- Configuring the FairScheduler
12. Cluster Maintenance
- Checking HDFS Status
- Copying Data Between Clusters
- Removing /Adding Cluster Nodes
- Rebalancing of Cluster
- NameNode Metadata Backup
- Cluster Upgrades
13. Monitoring and Troubleshooting Cluster
- General System Monitoring
- Clusters Monitoring
- Managing Hadoop’s Log Files
- Common Troubleshooting Issues
At the end of this course, students will be able to:
- describe history of Hadoop.
- describe the fundamental concepts of using Big Data.
- identify where Hadoop fits into a Big Data strategy.
- design a plan to create Hadoop cluster.
- explain HDFS features and NameNode.
- demonstrate how to get data into HDFS
- explain how to work with MapReduce
- implement installation and configuration of Hadoop.
- install and Configure Hadoop Clients.
- configure HDFS for Rack Awareness & HDFS High Availability
- administer cluster maintenance.
- schedule Hadoop’s job
- describe Hadoop cluster maintenance.
- monitor and troubleshoot Hadoop cluster
- identify common integration points.
- explain Hadoop Security.
Competency 1 (1-6)
Competency 2 (7-10)
Courses and Registration
Add to Portfolio (opens a new window)