Course Overview
Statistical models are necessary for analyzing the type of multivariate (often large) datasets that are usually encountered in data science and statistical science. This is a graduate level course, within the curriculum for Duke's Master in Interdisciplinary Data Science (MIDS) program, that aims to provide students with the statistical data analysis tools needed to succeed as data scientists.
In this course, you will learn the general work flow for building statistical models and using them to answer inferential questions. You will learn several parametric modeling techniques such as generalized linear models, models for multilevel data and time series models. You will also learn to handle messy data, including data with missing or erroneous values, and data with outliers or non-standard distributions. You will be able to assess model fit, validate model assumptions and more generally, check whether proposed statistical models are appropriate for any given data. You will also learn causal inference under the potential outcomes framework. Should time permit, we may also briefly cover nonparametric models such as classification and regression trees.
Although this course emphasizes data analysis over rigorous mathematical theory, students who wish to explore the mathematical theory in more detail than what is covered in class are welcome to engage with and request further reading materials from the instructor outside of class.
Learning Objectives
By the end of this course, students should be able to
- Use the statistical methods and models covered in class to analyze real multivariate data that intersect with various fields.
- Assess the adequacy of statistical models to any given data and make a decision on what to do in cases when certain models are not appropriate for a given dataset.
- Cleanup and analyze messy datasets using approaches covered in class.
- Hone collaborative and presentations skills through the process of consistent team work on and class presentations of team projects.
Course Info
Lectures
270 Gross Hall
Tue/Thur 10:05 - 11:20am
Labs
270 Gross Hall
Every other Fri 10:00 - 11:20am
Teaching Team and Office Hours
Instructor | Dr. Olanrewaju Michael Akande | Mon 3:00 - 4:30pm, Tue 1:45 - 2:45pm, Thur 2:00 - 3:30pm | 256 Gross Hall | |
TAs | Azucena Morales (until Oct 1) | Thur 2:30 - 4:30pm | 257 Gross Hall | |
Chenxi Wu | Wed 3:00 - 5:00pm | 257 Gross Hall | ||
Siqi Fu (from Oct 1) | Thur 3:00 - 5:00pm | 257 Gross Hall |
Texts
Data Analysis Using Regression and Multilevel/Hierarchical Models | Gelman A., and Hill, J. | Recommended but not compulsory |
An Introduction to Statistical Learning with Applications in R | James, G., Witten, D., Hastie, T., and Tibshirani, R. | Optional (Free pdf available online via the link) |
Lecture notes and slides, lab exercises and other reading resources will be posted on the course website. We will only loosely follow these textbooks.
Materials
You should have access to a laptop and bring it to every class, fully charged.
Important Dates
Monday, September 2 | Labor day; classes in session |
Fri, September 6 | Drop/add ends |
Fri, October 4, 7:30pm | Fall break begins |
Wed, October 9, 8:30am | Fall break ends |
Mon, November 4, 11:59pm | Final project proposal due |
Tue, November 26 | Final project presentations I |
Tue, November 26, 10:30pm | Thanksgiving; graduate classes end |
Tue, December 3 | Final project presentations II |
Thur, December 5 | Final project presentations III |
Tue, December 10 | Final project report due |
Green Classroom
This course has achieved Duke’s Green Classroom Certification. The certification indicates that the faculty member teaching this course has taken significant steps to green the delivery of this course. Your faculty member has completed a checklist indicating their common practices in areas of this course that have an environmental impact, such as paper and energy consumption. Some common practices implemented by faculty to reduce the environmental impact of their course include allowing electronic submission of assignments, providing online readings and turning off lights and electronics in the classroom when they are not in use. The eco-friendly aspects of course delivery may vary by faculty, by course and throughout the semester. Learn more at http://sustainability.duke.edu/action/certifications/classroom/index.php.
Acknowledgement
This web page contains materials such as lecture slides, homework assignments, and datasets developed or adapted by Dr. Jerry Reiter.