Program Content
The program will provide a hands-on introduction to data analytics topics ranging from basic SQL and Python coding to building and interpreting machine learning models to tackle policy problems. During the program, we will use a mix of confidential datasets in the course materials.
The training will take place at the main College Park, MD, campus of the University of Maryland.
Table of contents:
COMMUNICATIONS
We primarily use three media to distribute information and communicate amongst the group: this website, e-mail and Slack. In general, the instructor team will respond quickly to either e-mail or Slack messages; however, we tend to prefer Slack for technical issues and sharing snippets of information.
E-mail addresses
Group email account
dataanalytics@coleridgeinitiative.org - the group email for program instructors and coordinators; general program, logistics and ADRF system questions should be directed here (ADRF is our computing platform, see the computing environment page for more info)
support@applieddataanalyticsprogram.org - ADRF support account from which system communications will be sent
Instructor email addresses:
Rayid Ghani - rayid@uchicago.edu
Clayton Hunter - clayton.hunter@nyu.edu or clayton@coleridgeinitiative.org
Brian Kim - kimbrian@umd.edu
Frauke Kreuter - fkreuter@umd.edu
Avishek Kumar - avishekkumar@uchicago.edu
Julia Lane - julia.lane@nyu.edu
Slack
We use Slack extensively and it is often the best way to get in touch with us, the team is coleridge-initiative.slack.com and by now you should all have received invitations to join (if you have not please let us know). If you are unfamiliar with Slack, it has different "channels" to help organize conversations; previous classes have used Slack in various ways (eg for a channel for Python specific questions), but the two primary channels we expect to use are (i) the "class-6-uchicago" channel for general class discussion and sharing documents and (ii) the "adrf-tech-support" channel for any technical support in accessing the ADRF.
PRE-COURSE MATERIAL
Our collaborators at the University of Maryland and the University of Mannheim have prepared some introductory material for you to work through prior to the program. The material is presented in a four-week, short course format; two weeks for SQL and two for Python. We expect the material for each week to take a maximum of 4 hours total to work through.
DATA DOCUMENTATION
Illinois Department of Employment Services (IDES)
Illinois Department of Corrections (IDOC) admissions & exits: link to pdf
More details on EDUCLVL and HCLASS fields:
Illinois Department of Human Services (IDHS): documentation link
Census LEHD Origin-Destination Employment Statistics (LODES)
TANF Readings and resources
Colorado Works Exit Survey Project Summary of Fiscal Year 2016/2017
Family Employment Program (FEP) Redesign Study of Utah 2014: Final Report
WorkFirst Wage Progression and Returns Report: through second quarter 2012 (WA)
WorkFirst Wage Progression and Returns Report: through first-quarter 2016 (WA)
HHS Luminaries series of videos: link
Training program
Site Logistics
Training location: see agenda below
Lefrak Hall address: 6903 Preinkert Dr, College Park, MD 20740
Links to additional information
WEEK 1: Oct 31 - Nov 2 & 5-6 (9am - 4pm eastern)
Wednesday-Friday: 1208 Lefrak Hall, University of Maryland
Oct 31 - Program introduction and overview
09:00 - 09:30 Welcome and Introductions (morning slides)
09:30 - 09:45 Goals for the program
09:45 - 10:45 Project scoping
10:45 - 11:00 Break
11:00 - 12:00 Introduction to ADRF and Security Training
12:00 - 01:00 Lunch
01:00 - 01:15 Review of the morning (afternoon slides)
01:15 - 03:30 Buffet of Analytics topics
Topics we will cover
Topics we will NOT cover
03:30 - 04:00 Logging in to the ADRF
Nov 1 - Data exploration & Visualization
09:00 - 09:15 Welcome and overview of the day
09:15 - 10:30 Databases (lecture slides, normalization slides)
10:30 - 10:45 Break
10:45 - 12:00 Data Visualization (lecture)
12:00 - 01:00 Lunch
1:25 - 04:00 Hands-on data exploration with SQL & Python
Nov 2 - Record Linkage
09:00 - 10:15(?) Data Visualization (notebook)
break
10:30(?) - 12:00 Record Linkage (lecture, McDonald “An Introduction to Probabilistic Linkage”)
12:00 - 01:00 Lunch
01:00 - 01:45 Guest Lecture, Rick Hendra: Research Evidence on TANF and Employment (slides)
01:45 - 02:45 Data preparation: creating "labels"
02:45 - 03:00 Break
03:00 - 04:00 Data preparation: creating "features"
Monday-Tuesday: 2208 Lefrak Hall, University of Maryland
Nov 5 - Introduction to Machine Learning
09:00 - 12:00 All (almost) of Machine Learning (slides)
12:00 - 01:00 Lunch
01:00 - 02:45 Walk through of example ML prediction model
02:45 - 03:00 Break
03:00 - 04:00 Project discussion: label (outcome) definition and features
Nov 6 - Text Analysis
WEEK 2: Dec 5-7 & 10-11 (9am - 4pm eastern)
Wednesday-Friday: 1208 Lefrak Hall, University of Maryland
Dec 5 - Machine Learning - methods
09:00 - 12:00 Machine Learning lecture - recap (slides) and methods (slides)
12:00 - 01:00 Lunch
01:00 - 02:00 Project discussion - model set-up
02:00 - 04:00 Project work
Dec 6 - Machine Learning in Practice
09:00 - 12:00 Machine Learning lecture - model selection, evaluation (slides), and bias (slides)
12:00 - 01:00 Lunch
01:00 - 04:00 Project work
Dec 7 - Privacy and Confidentiality
09:00 - 10:30 Privacy & Confidentiality lecture (slides)
10:30 - 10:45 Break
10:45 - 12:00 Disclosure review & export requests
12:00 - 01:00 Lunch
01:00 - 04:00 Project work
Monday-Tuesday: 2208 Lefrak Hall, University of Maryland
Dec 10 - Inference
09:00 - 10:30 Inference lecture (slides)
10:30 - 10:45 Break
10:45 - 12:00 Project work
12:00 - 01:00 Lunch
01:00 - 04:00 Project work
Dec 11 - Final Presentations
09:00 - 09:10 Program recap
09:10 - 11:00 Finalize project presentations
12:00 - 01:00 Lunch
01:00 - 04:00 Final Presentations
Projects
Final project presentations will be held in person on December 11. Each team will have 20 minutes to present followed by 10 minutes of Q&A.
Export requests should be submitted by end of day Monday, December 17.
Final project reports are due Friday, December 21.