Program Content
The program will provide a hands-on introduction to data analytics topics ranging from basic SQL and Python coding to building and interpreting machine learning models to tackle policy problems. During the program, we will use a mix of confidential datasets in the course materials.
The training will take place at the main Hyde Park campus of the University of Chicago.
Table of contents:
Training modules
Presentations - Friday, September 28, 11am - 2pm Central (remote over WebEx)
Communications
We primarily use two media to distribute information and communicate amongst the group: website, e-mail and Slack. In general, the instructor team will respond quickly to either e-mail or Slack messages; however, we tend to prefer Slack for technical issues and sharing snippets of information.
E-mail addresses
Group email accounts
dataanalytics@umd.edu - the group email for instructors, general program or logistics questions should be directed here
support@applieddataanalyticsprogram.org - ADRF support account from which system communications will be sent (ADRF is our computing platform, see the computing environment page for more info)
Instructor email addresses:
Rayid Ghani - rayid@uchicago.edu
Clayton Hunter - clayton.hunter@nyu.edu or clayton@coleridgeinitiative.org
Brian Kim - kimbrian@umd.edu
Frauke Kreuter - fkreuter@umd.edu
Avishek Kumar - avishekkumar@uchicago.edu
Julia Lane - julia.lane@nyu.edu
Slack
We use Slack extensively and it is often the best way to get in touch with us, the team is ada-course.slack.com and by now you should all have received invitations to join (if you have not please let us know). If you are unfamiliar with Slack, it has different "channels" to help organize conversations; previous classes have used Slack in various ways (eg for a channel for Python specific questions), but the two primary channels we expect to use are (i) the "class-5-uchicago" channel for general class discussion and sharing documents and (ii) the "adrf-tech-support" channel for any technical support in accessing the ADRF.
Pre-course material
Our collaborators at the University of Maryland and the University of Mannheim have prepared some introductory material for you to work through prior to the program. The material is presented in a four-week, short course format; two weeks for SQL and two for Python. We expect the material for each week to take a maximum of 4 hours total to work through.
data documentation
Illinois Department of Employment Services (IDES)
Illinois Department of Corrections (IDOC) admissions & exits: link to pdf
More details on EDUCLVL and HCLASS fields:
Illinois Department of Human Services (to be uploaded)
Census LEHD Origin-Destination Employment Statistics (LODES)
Project team list: PDF
WEEK 1: July 18-20 & 23-24 (all times central)
July 18 - Program introduction, Kent Laboratory 107 1020 E 58th Street, Chicago IL, 60632
9:00-10:30 Welcome & Introductions (slides)
10:30-11 Break
11-12:00 NYU's Administrative Data Research Facility (ADRF) - Security training, data agreements, and brief demo
12:00-1:00 Lunch
1:00-1:30 Hands introduction to ADRF
1:30-4:00 Projects and scoping
July 19 - Data exploration & Visualization, 9am-4pm Central at Kent Laboratory 107 1020 E 58th Street, Chicago IL, 60632
9:00-10:30 Introduction to Databases (slides)
10:30-10:45 Break
10:45-12:00 Hands on exploration of datasets
12:00-1:00 Lunch
1:00-2:30 Data visualization (lecture)
2:30-4:00 Data visualization (exercises)
July 20 - Record Linkage, 9am-4pm Central at Kent Laboratory 107 1020 E 58th Street, Chicago IL, 60632
9:00-9:30 Measurement and Description (slides)
9:30-10:45 Record Linkage (lecture)
10:45-11:00 Break
11:00-12:00 Record Linkage (exercises)
12:00-1:00 Lunch
1:00-4:00 Hands on data exploration in Python
July 23 - Introduction to Machine Learning, 10am-5pm Central at Harris School of Public Policy 289A 1155 E 60th St, Chicago, IL 60637
10:00-11:30 Introduction to Machine Learning (slides | recording)
11:30-12:00 Break
12:00-1:00 Machine Learning (cont.)
1:00-2:00 Lunch
2:00-4:00 Machine Learning model evaluation
4:00-5:00 Project discussion and data exploration
July 24 - Text Analysis and Network Analysis, 10am-5pm Central at Harris School of Public Policy 289A 1155 E 60th St, Chicago, IL 60637
WEEK 2: Sept 5-7 & 10-11 (10am-5pm central)
Location for week 2: Polsky North - 2nd Floor, 1452 E 53rd St, Chicago, IL 60615 (Washington Park Room)
Sept 5 - Machine Learning Deep Dive (WebEx recording)
10:00 - 10:30: Welcome back, Recap of Week 1, and Goals for this week
10:30 - 11:00: Machine Learning Recap (what we covered in week 1)
What is Machine Learning and what can it be used for
Types of Machine Learning Methods
How to evaluate Machine Learning Methods (methodology and metrics)
11:00 - 1:00: Machine Learning Lecture (Deeper dive in to methods)
Unsupervised learning Methods
Supervised Learning Methods
1:00 - 2:00: Lunch
2:00 - 3:00: Machine Learning notebook
3:00 - 5:00: Project work
Sept 6 - Inference (WebEx recording)
10-11:30 Inference lecture (slides)
11:30 - 12:00 Break
12:00 - 1:00 Inference notebook
1:00 - 2:00 Lunch
2:00 - 5:00 Project work
Sept 7: Machine Learning in Practice
10:00 - 11:00 Group discussion of team projects
11:00 - 11:15 Break
11:15 - 1:00 Machine Learning in Practice
FAQs
Common issues/mistakes
Life after building models
1:00 - 2:00 Lunch
2:00 - 5:00 Project work
Sep 10: Privacy and Confidentiality (WebEx recording)
10:00 - 11:30 Privacy and confidentiality lecture (slides)
11:30 - 11:45 Break
11:45 - 1:00 ADRF Disclosure review & Export requests
1:00 - 2:00 Lunch
2:00 - 5:00 Project work
Happy hour
Sep 11: Ethics & Interim presentations
10:00 - 11:30 Ethics, Bias, and Fairness in Machine Learning Systems (slides)
11:30 - 11:45 Break
11:45 - 1:00 Project work
1:00 - 2:00 Lunch
2:00 - 3:00 Interim presentations and instructor comments/feedback (~10 minutes total per team)
3:00 - 5:00 Project work
presentations
Final project presentations will be held remotely over WebEx on Friday, September 28, between 11am and 2pm Central time. Each team will have 20 minutes to present followed by 10 minutes of Q&A.
Presentation schedule (WebEx recording)
11:00 - 11:30 Team 1: WIA Youth Employment Outcomes (presentation, report)
11:30 - 12:00 Team 2: Firm Survivability in Illinois (presentation, report)
12:00 - 12:30 Team 6: Employer survivability (report)
12:30 - 1:00 Team 4: 1-year survivability of small businesses in Illinois (presentation, report)
1:00 - 1:30 Team 3: Predicting which firms will have high-turnover of low-wage workers (presentation)
1:30 - 2:00 Team 5: Firm survivability across IL Economic Development Regions (presentation, report)
WebEx information
Program participants should have all received an invite from “messenger@webex.com” to be a Panelist during the WebEx Event on Friday, Sep 28. Please feel free to invite your colleagues to watch your presentation via the Attendee link.