The Data Science for Business Certificate consists of four courses. The four courses follow a sequence that explores foundational and functional knowledge and moves into synthesis and integration of that knowledge. The following descriptions provide greater insight into the content of each course.
Foundations of Data Science with R (Fall Semester)
This foundational course offers a full-spectrum introduction to data science and data science workflows, emphasizing data as a source of value creation in the enterprise. The R programming environment serves as the implementation vehicle in support of essential data science activities – data exploration and visualization, data wrangling, predictive modeling, model deployment, and communication. The R programming environment, along with Python, is among the most important tools in the data scientist’s toolbox and this course will feature tools and a style of programming inspired by the popular tidyverse ecosystem – ggplot2 for data visualization, dplyr and tidyr for data wrangling. Students will master elements of the data science workflow through a series of short R programming exercises reinforced by a full-spectrum, integrative final project. Presentation skills are an ever-present theme as students are challenged, through every stage of analysis, to communicate managerial relevance and value to the enterprise.
Practical Applications of Python for Data Science (Fall Semester)
A relatively fast-paced introduction to the Python programming language and its use in data science. Topics include typical Python objects and control structures with a special focus on data ingestion and manipulation. Special attention will be devoted to Python’s system of packages for data analysis including Pandas, SciKit, Numpy and methods to move data between Python and common statistics software like R and Stata. Weekly programming exercises complement in-class lessons, and a final project gives students the opportunity to incorporate the various skills they’ve been learning into a reproducible report analogous to those used in industry to present insights from a data science effort.
Fundamentals of Data Engineering (Spring Semester)
Data management is core to both applied computer science and data science. This includes storing, managing, and processing datasets of varying sizes and types. This course introduces students to the various ways in which data is stored and processed including relational databases, file-based databases, cloud-based storage and data streaming. A key component of the course is learning which architectures fit which types of data science problem (and the strengths and weaknesses of each). Students will learn to work with data that is both clean and structured, and dirty and unstructured.
Applied Machine Learning (Spring Semester)
Machine learning is becoming a core component of many modern organizational processes. It is a growing field at the intersection of computer science and statistics focused on finding patterns in data. Prominent applications include personalized recommendations, image processing and speech recognition. This course will focus on the application of existing machine learning libraries to practical problems faced by organizations. Through lectures, cases and programming projects, students will learn how to use machine learning to solve real world problems, run evaluations and interpret their results.