class: center, middle, inverse, title-slide # Introduction to IDS 702: modeling and representation of data ### Dr. Olanrewaju Michael Akande ### Aug 27, 2019 --- class: center, middle # Welcome! --- ## What is this course about? <i class="fa fa-book fa-2x"></i> Learn the general work flow for building statistical models. -- <i class="fa fa-tasks fa-2x"></i> Use statistical models to answer inferential questions. -- <i class="fa fa-database fa-2x"></i> Work with real and messy datasets. -- <i class="fa fa-group fa-2x"></i> Hone collaborative and presentations skills. -- <i class="fa fa-refresh fa-spin fa-2x"></i> Analyze! Analyze!! Analyze!!! -- --- <i class="fa fa-quote-left fa-2x fa-pull-left fa-border" aria-hidden="true"></i> <i class="fa fa-quote-right fa-2x fa-pull-right fa-border" aria-hidden="true"></i> Essentially, all models are wrong, but some are useful. -- Box, George E. P. --- ## Instructor Dr. Olanrewaju Michael Akande <i class="fa fa-envelope"></i> [olanrewaju.akande@duke.edu](mailto:olanrewaju.akande@duke.edu) <br> <i class="fa fa-home"></i> [akandelanre.github.io.](https://akandelanre.github.io/IDS702_F19/) <br> <i class="fa fa-university"></i> [256 Gross Hall](https://maps.duke.edu/?id=21#!m/6866?s/Gross%20Hall) <br> <i class="fa fa-calendar"></i> Mon/Tue 3:00 - 4:00pm --- ## TAs [Azucena Morales](https://datascience.duke.edu/lidia-azu-azucena-morales-vasquez) <i class="fa fa-envelope"></i> [azucena.morales@duke.edu](mailto:azucena.morales@duke.edu) <br> <i class="fa fa-university"></i> [257 Gross Hall](https://maps.duke.edu/?id=21#!m/6866?s/Gross%20Hall) <br> <i class="fa fa-calendar"></i> Thur 2:30 - 4:30pm <br> Chenxi Wu <i class="fa fa-envelope"></i> [chenxi.wu@duke.edu](mailto:chenxi.wu@duke.edu) <br> <i class="fa fa-university"></i> [257 Gross Hall](https://maps.duke.edu/?id=21#!m/6866?s/Gross%20Hall) <br> <i class="fa fa-calendar"></i> Wed 3:00 - 5:00pm --- ## FAQs All materials and information will be posted on the course webpage: [akandelanre.github.io/IDS702_F19/](https://akandelanre.github.io/IDS702_F19/) -- - Will we cover statistical theory? Yes, but for the most part, we will prioritize data analysis over rigorous theory. -- - Am I prepared to take this course? Yes, if you are familiar with the topics covered in the MIDS summer statistics review. -- - Will we be doing "very heavy" computing? Not exactly. We will mostly work with popular statistical packages. -- - What computing language will we use? R. -- - Why not python? I'll provide answers in class. --- class: center, middle # Your turn! --- ## Introductions - Your name. - Where you are from. - Your educational background. - One interesting fact other students don't already know about you. - One thing you hope to take away from this class. --- class: center, middle # Course structure and policies --- ## Prerequisites - Students are expected to know all topics covered in the MIDS summer course review and boot camp. These include -- + Basic probability (including conditional probability, expectations and common probability distributions). -- + Basic statistical inference (including hypothesis testing, confidence intervals, linear regression with one predictor, and exploratory data analysis). -- + Familiarity with R/Rstudio. -- - Due to space constraints, the course is open only to students in the MIDS program. -- - Students outside the MIDS program should come talk to me after class. --- ## Class meetings - Class attendance is compulsory! We will often engage in discussions outside/beyond (but related to) the contents of the lecture slides. -- - Class meetings are designed to be interactive; try to engage and don't be shy. -- - The methods we will cover are best learned by practical data analysis, so, there will be lots of "learn-by-doing". -- - Always revisit in-class examples and rerun the R codes to be sure you understand each topic. -- - No need to print the course slides; let's keep the class green when possible. -- - Bring your charged laptop to class every day. -- - Free free to ask me as many questions as possible during and outside classes; THERE ARE NO STUPID QUESTIONS! --- ## Grading Component | Percentage ----------------------|---------------- Methods and Data Analysis Assignments | 30% Final Project | 30% Team Project 1 | 15% Team Project 2 | 15% Lab Assignments | 10% -- - Although class attendance does not contribute directly to your final grades, attendance remains a firm expectation! -- - There are no make-ups for assignments or the projects except for medical or familial emergencies or for reasons approved by the instructor before the due date. -- - Exact ranges for letter grades will be curved and cutoffs will be determined after the final exam. --- ## Methods and data analysis assignments - Assignments are posted on the course website. -- - Turn in your assignments at the beginning of class on the due date. -- - You can work with others on the assignments, but each person must write up and turn in her or his own answers. -- - Assignments include questions on the computational and the mathematical aspects of the methods that we will learn during the semester, and questions that ask students to apply the modeling skills discussed during the semester. -- - The assignments must be typed up using R Markdown, LaTeX or another word processor, and submitted on [Gradescope](https://www.gradescope.com/courses/57701/assignments) under Assignments. --- ## Lab assignments - Lab assignments are meant to give you more hands-on experience with data analysis using R. -- - Labs times also gives you an additional platform to ask for help for your team and individual projects. -- - Attendance is not mandatory, however, each lab assignment should be submitted in timely fashion on the due date. -- - You are REQUIRED to use R Markdown to type up your lab reports. --- ## Team projects - Students work in teams to analyze data selected by the instructor. -- - You will work within your preassigned MIDS groups for the team projects. -- - Each group must write ONE team report with their data analysis findings. -- - Each group will also have the opportunity to present their results to the rest of the class. -- - Detailed instructions will be made available later. --- ## Final project - Each student will analyze a data-based research question of their choosing, subject to approval. -- - The data should comprise several variables amenable to statistical analyses via modeling. -- - You can bring in their own research data sets, or you can ask me for assistance with identifying appropriate data. -- - All students present their results in a class at the end of the semester. -- - Detailed instructions will be made available later. --- ## Academic integrity: - Remember the Duke Community Standard that you have agreed to abide by: .large[ > To uphold the Duke Community Standard: > - I will not lie, cheat, or steal in my academic endeavors; > - I will conduct myself honorably in all my endeavors; and > - I will act if the Standard is compromised. ] -- - Cheating on exams or plagiarism on homework assignments, lying about an illness or absence and other forms of academic dishonesty are a breach of trust with classmates and faculty, violate the <a href="http://www.studentaffairs.duke.edu/conduct/resources/dcs">Duke Community Standard</a>, and will not be tolerated. -- - Please review the Academic Dishonesty policies <a href="https://studentaffairs.duke.edu/conduct">here</a>. --- ## Diversity & inclusiveness: - This course is designed so that students from all backgrounds and perspectives all feel welcome both in and out of class. -- - Please feel free to talk to me (in person or via email) if you do not feel well-served by any aspect of this class, or if some aspect of class is not welcoming or accessible to you. -- - My goal is for you to succeed in this course, therefore, please let me know immediately if you feel you are struggling with any part of the course more than you know how to manage. -- - Doing so will not affect your grades, but it will allow me to provide the resources to help you succeed in the course. --- ## Disability statement - Students with disabilities who believe that they may need accommodations in the class are encouraged to contact the <a href="https://access.duke.edu/students/staff.php">Student Disabilities Access Office</a> at 919-668-1267 or <a href="mailto:disabilities@aas.duke.edu">disabilities@aas.duke.edu</a> as soon as possible to better ensure that such accommodations are implemented in a timely fashion. --- ## Final Notes - Make use of the teaching team's office hours, we're here to help! -- - Do not hesitate to come to my office during office hours or by appointment to discuss a homework problem or any aspect of the course. -- - When the teaching team has announcements for you we will send an email to your Duke email address. Please make sure to check your email daily. -- - Please refrain from texting or using your computer for anything other than coursework during class. -- - You must be in class on a day when you're scheduled to present, there are no make ups for presentations. --- ## Final notes - Split into groups of fours: just find the three closest students to you. - Open the assessment key. On the website, just go back to the class schedule, refresh the page, and click the same HW link. - Discuss the responses for in 10 minutes. Which questions did you get wrong and why?