Dec 30, 2024  
2018-2019 Course Catalog 
    
2018-2019 Course Catalog [ARCHIVED CATALOG]

Add to Portfolio (opens a new window)

CVF 2206 - Data Science and Big Data Analytics

Credits: 3
Hours/Week: Lecture 2 Lab 2
Course Description: This course will cover basic and advanced analytic methods and big data analytics technology and tools, including MapReduce and Hadoop. The extensive labs throughout the course provide students with the opportunity to apply these methods and tools to real world business challenges. This course takes a technology-neutral approach. In a final lab, students will address a big data analytics challenge by applying the concepts taught in the course to the context of the Data Analytics Lifecycle. Students will prepare for the Proven Professional Data Scientist Associate (EMCDSA) certification exam and establish a baseline of Data Science skills.
MnTC Goals
None

Prerequisite(s): MATH 1025  with a grade of C or higher OR instructor consent. System administration experience on Microsoft Windows or Linux operating systems.
Corequisite(s): None
Recommendation: None

Major Content
  1. contribute to a data science team.
  2. reframe a business challenge as an analytics challenge.
  3. deploy a structured lifecycle approach to data analytics problems.
  4. apply appropriate analytic techniques and tools to analyze big data.
  5. develop a compelling story with the data to drive business action.
  6. use open source tools such as R, Hadoop, and Postgres.
  7. prepare for EMC ProvenTM Professional Data Scientist certification.

Learning Outcomes
At the end of this course students will be able to:

  1. Introduction to Big Data Analytics
    • Big Data Overview
    • State of the practice in analytics
    • The Role of the Data scientist
    • Big Data Analytics in industry verticals
  2. End-to-end data analytics lifecycle
    • Key role for a successful analytics project
    • Main phases of the lifecycle
    • Developing core deliverables for stakeholders
  3. Using R to execute basic analytics methods
    • Introduction to R
    • Analyzing and exploring data with R
    • Statistics for mode building and evaluation
  4. Advance analytics and statistical modeling for Big Data - Theory and methods
    • K-Means Clustering
    • Association rules
    • Linear and logistic regression
    • Naive Bayesian classifier
    • Decision tree
    • Time series analysis
    • Text analysis
  5. Advance analytics and statistical modeling for Big Data - Technology and Tools
    • Using MapReduce / Hadoop for analyzing unstructured data
    • Hadoop ecosystem of tools
    • In-database analytics
    • MADlib and advanced SQL Techniques
  6. Endgame, or Putting it all together
    • How to operationalize an analytics project
    • Creating the final deliverables
    • Data visualization techniques
    • Hands-on Application of analytics lifecycle to a Big Data Analytics problem

Competency 1 (1-6)
None
Competency 2 (7-10)
None


Courses and Registration



Add to Portfolio (opens a new window)