| CVF 2071 - Hadoop Administration Credits: 3Hours/Week: Lecture 2 Lab 1
 Course Description: This course builds on topics in CVF 1071  , Introduction to Big Data Analytics and Security.  It provides students with a comprehensive introduction to the steps necessary to install, configure, operate, and maintain Hadoop.  The course begins with an overview of the Big Data landscape and then dives into a system administration working view of running Hadoop.  Students will also have the opportunity to install Splunk on top of Hadoop and examine how to process and analyze the data using Splunk’s Search Processing Language (SPL) as an implementation of MapReduce function.  This course employs both “open source technology” (Hadoop) and “commercial technology” (Splunk).
 MnTC Goals
 None
 
 Prerequisite(s): CVF 1071   and CVF 1205  or CSCI 1060  with grades of C or higher, or instructor consent.
 Corequisite(s): None
 Recommendation: None
 
 Major Content
 
 
	Introduction to Hadoop
	
		History of HadoopCore Components of HadoopFundamental Concepts of Hadoop 2. Planning Hadoop Cluster 
	Basic Planning ConsiderationsChoosing HardwareNetwork ConsiderationsNodes ConfiguringPlanning for Cluster Management 3. Hadoop Distributed File System  
	HDFS FeaturesReading and Writing FilesNameNode ConsiderationsHDFS SecurityNamenode Web User InterfaceHadoop File Shell 4. Getting Data into HDFS 
	Pulling data from External Sources with FlumeUsing Sqoop to import Data from Relational DatabasesBest PracticesREST Interfaces 5. MapReduce 
	Architectural OverviewMapReduce overviewFeatures of MapReduceYARN MapReduce Version 2Failure RecoveryThe JobTracker Web User Interface 6. Installation, Initialization, and Configuration of Hadoop 
	Configuration and Deployment TypesInstalling HadoopSpecifying the Hadoop ConfigurationInitial HDFS and MapReduce ConfigurationLog Files 7. Installing/Configuring 
	HiveImpalaPig 8. Hadoop Clients 
	What is Hadoop Client?Installing and Configuring Hadoop ClientsInstallation and Configuration of HueAuthentication and Configuration of Hue 9. Hadoop Advanced Cluster Configuration 
	Advanced Configuration ParametersConfiguring Hadoop PortsConfiguring HDFS for RackAwareness & HDFS High AvailabilityExplicitly Including and Excluding Hosts 10. Hadoop Security 
	Importance of Hadoop SecurityHadoop’s Security System ConceptsWhat Kerberos Is and How it WorksUsing Kerberos to Secure a Hadoop Cluster 11. Scheduling and Managing Jobs 
	Scheduling Hadoop JobsManaging Running JobsConfiguring the FairScheduler 12. Cluster Maintenance 
	Checking HDFS StatusCopying Data Between ClustersRemoving /Adding Cluster NodesRebalancing of ClusterNameNode Metadata BackupCluster Upgrades 13. Monitoring and Troubleshooting Cluster 
	General System MonitoringClusters MonitoringManaging Hadoop’s Log FilesCommon Troubleshooting Issues  Learning Outcomes
 At the end of this course, students will be able to:
 
	describe history of Hadoop.describe the fundamental concepts of using Big Data.identify where Hadoop fits into a Big Data strategy.design a plan to create Hadoop cluster.explain HDFS features and NameNode.demonstrate how to get data into HDFSexplain how to work with MapReduceimplement installation and configuration of Hadoop.install and Configure Hadoop Clients.configure HDFS for Rack Awareness & HDFS High Availabilityadminister cluster maintenance.schedule Hadoop’s jobdescribe Hadoop cluster maintenance.monitor and troubleshoot Hadoop clusteridentify common integration points.explain Hadoop Security. Competency 1 (1-6)
 None
 Competency 2 (7-10)
 None
 Courses and Registration
 
 
 Add to Portfolio (opens a new window)
 |