Course Overview

Statistical models are necessary for analyzing the type of multivariate (often large) datasets that are usually encountered in data science and statistical science. This is a graduate level course, within the curriculum for Duke's Master in Interdisciplinary Data Science (MIDS) program, that aims to provide students with the statistical data analysis tools needed to succeed as data scientists.

In this course, you will learn the general work flow for building statistical models and using them to answer inferential questions. You will learn several parametric modeling techniques such as generalized linear models, models for multilevel data and time series models. You will also learn to handle messy data, including data with missing or erroneous values, and data with outliers or non-standard distributions. You will be able to assess model fit, validate model assumptions and more generally, check whether proposed statistical models are appropriate for any given data. You will also learn causal inference under the potential outcomes framework. Should time permit, we may also briefly cover nonparametric models such as classification and regression trees.

Although this course emphasizes data analysis over rigorous mathematical theory, students who wish to explore the mathematical theory in more detail than what is covered in class are welcome to engage with and request further reading materials from the instructor outside of class.

Learning Objectives

By the end of this course, students should be able to

  • Use the statistical methods and models covered in class to analyze real multivariate data that intersect with various fields.
  • Assess the adequacy of statistical models to any given data and make a decision on what to do in cases when certain models are not appropriate for a given dataset.
  • Cleanup and analyze messy datasets using approaches covered in class.
  • Hone collaborative and presentations skills through the process of consistent team work on and class presentations of team projects.

Course Info


  270 Gross Hall

  Tue/Thur 10:05 - 11:20am


  270 Gross Hall

  Every other Fri 10:00 - 11:20am

Teaching Team and Office Hours

Instructor Dr. Olanrewaju Michael Akande   Mon 3:00 - 4:30pm, Tue 1:45 - 2:45pm, Thur 2:00 - 3:30pm 256 Gross Hall
TAs Azucena Morales (until Oct 1) Thur 2:30 - 4:30pm 257 Gross Hall
Chenxi Wu Wed 3:00 - 5:00pm 257 Gross Hall
Siqi Fu (from Oct 1) Thur 3:00 - 5:00pm 257 Gross Hall


Data Analysis Using Regression and Multilevel/Hierarchical Models Gelman A., and Hill, J. Recommended but not compulsory
An Introduction to Statistical Learning with Applications in R James, G., Witten, D., Hastie, T., and Tibshirani, R. Optional (Free pdf available online via the link)

Lecture notes and slides, lab exercises and other reading resources will be posted on the course website. We will only loosely follow these textbooks.


You should have access to a laptop and bring it to every class, fully charged.

Important Dates

Monday, September 2 Labor day; classes in session
Fri, September 6 Drop/add ends
Fri, October 4, 7:30pm Fall break begins
Wed, October 9, 8:30am Fall break ends
Mon, November 4, 11:59pm Final project proposal due
Tue, November 26 Final project presentations I
Tue, November 26, 10:30pm Thanksgiving; graduate classes end
Tue, December 3 Final project presentations II
Thur, December 5 Final project presentations III
Tue, December 10 Final project report due

Green Classroom

This course has achieved Duke’s Green Classroom Certification. The certification indicates that the faculty member teaching this course has taken significant steps to green the delivery of this course. Your faculty member has completed a checklist indicating their common practices in areas of this course that have an environmental impact, such as paper and energy consumption. Some common practices implemented by faculty to reduce the environmental impact of their course include allowing electronic submission of assignments, providing online readings and turning off lights and electronics in the classroom when they are not in use. The eco-friendly aspects of course delivery may vary by faculty, by course and throughout the semester. Learn more at


This web page contains materials such as lecture slides, homework assignments, and datasets developed or adapted by Dr. Jerry Reiter.