|
Dec 30, 2024
|
|
|
|
CVF 2206 - Data Science and Big Data Analytics Credits: 3 Hours/Week: Lecture 2 Lab 2 Course Description: This course will cover basic and advanced analytic methods and big data analytics technology and tools, including MapReduce and Hadoop. The extensive labs throughout the course provide students with the opportunity to apply these methods and tools to real world business challenges. This course takes a technology-neutral approach. In a final lab, students will address a big data analytics challenge by applying the concepts taught in the course to the context of the Data Analytics Lifecycle. Students will prepare for the Proven Professional Data Scientist Associate (EMCDSA) certification exam and establish a baseline of Data Science skills. MnTC Goals None
Prerequisite(s): MATH 1025 with a grade of C or higher OR instructor consent. System administration experience on Microsoft Windows or Linux operating systems. Corequisite(s): None Recommendation: None
Major Content
- contribute to a data science team.
- reframe a business challenge as an analytics challenge.
- deploy a structured lifecycle approach to data analytics problems.
- apply appropriate analytic techniques and tools to analyze big data.
- develop a compelling story with the data to drive business action.
- use open source tools such as R, Hadoop, and Postgres.
- prepare for EMC ProvenTM Professional Data Scientist certification.
Learning Outcomes At the end of this course students will be able to:
- Introduction to Big Data Analytics
- Big Data Overview
- State of the practice in analytics
- The Role of the Data scientist
- Big Data Analytics in industry verticals
- End-to-end data analytics lifecycle
- Key role for a successful analytics project
- Main phases of the lifecycle
- Developing core deliverables for stakeholders
- Using R to execute basic analytics methods
- Introduction to R
- Analyzing and exploring data with R
- Statistics for mode building and evaluation
- Advance analytics and statistical modeling for Big Data - Theory and methods
- K-Means Clustering
- Association rules
- Linear and logistic regression
- Naive Bayesian classifier
- Decision tree
- Time series analysis
- Text analysis
- Advance analytics and statistical modeling for Big Data - Technology and Tools
- Using MapReduce / Hadoop for analyzing unstructured data
- Hadoop ecosystem of tools
- In-database analytics
- MADlib and advanced SQL Techniques
- Endgame, or Putting it all together
- How to operationalize an analytics project
- Creating the final deliverables
- Data visualization techniques
- Hands-on Application of analytics lifecycle to a Big Data Analytics problem
Competency 1 (1-6) None Competency 2 (7-10) None Courses and Registration
Add to Portfolio (opens a new window)
|
|