Program Content

The program will provide a hands-on introduction to data analytics topics ranging from basic SQL and Python coding to building and interpreting machine learning models to tackle policy problems. During the program, we will use a mix of confidential datasets in the course materials.

The training will take place at the main College Park, MD, campus of the University of Maryland. 

Table of contents:

COMMUNICATIONS

We primarily use three media to distribute information and communicate amongst the group: this website, e-mail and Slack. In general, the instructor team will respond quickly to either e-mail or Slack messages; however, we tend to prefer Slack for technical issues and sharing snippets of information.

E-mail addresses

Slack

We use Slack extensively and it is often the best way to get in touch with us, the team is coleridge-initiative.slack.com and by now you should all have received invitations to join (if you have not please let us know). If you are unfamiliar with Slack, it has different "channels" to help organize conversations; previous classes have used Slack in various ways (eg for a channel for Python specific questions), but the two primary channels we expect to use are (i) the "class-6-uchicago" channel for general class discussion and sharing documents and (ii) the "adrf-tech-support" channel for any technical support in accessing the ADRF.

PRE-COURSE MATERIAL

Our collaborators at the University of Maryland and the University of Mannheim have prepared some introductory material for you to work through prior to the program. The material is presented in a four-week, short course format; two weeks for SQL and two for Python. We expect the material for each week to take a maximum of 4 hours total to work through. 

DATA DOCUMENTATION

Team list

WEEK 1: Oct 31 - Nov 2 & 5-6 (9am - 4pm eastern)

Wednesday-Friday: 1208 Lefrak Hall, University of Maryland

Oct 31 - Program introduction and overview

  • 09:00 - 09:30 Welcome and Introductions (morning slides)

  • 09:30 - 09:45 Goals for the program

  • 09:45 - 10:45 Project scoping

  • 10:45 - 11:00 Break

  • 11:00 - 12:00 Introduction to ADRF and Security Training

  • 12:00 - 01:00 Lunch

  • 01:00 - 01:15 Review of the morning (afternoon slides)

  • 01:15 - 03:30 Buffet of Analytics topics

    • Topics we will cover

    • Topics we will NOT cover

  • 03:30 - 04:00 Logging in to the ADRF

Nov 1 - Data exploration & Visualization

  • 09:00 - 09:15 Welcome and overview of the day

  • 09:15 - 10:30 Databases (lecture slides, normalization slides)

  • 10:30 - 10:45 Break

  • 10:45 - 12:00 Data Visualization (lecture)

  • 12:00 - 01:00 Lunch

  • 1:25 - 04:00 Hands-on data exploration with SQL & Python

Nov 2 - Record Linkage

  • 09:00 - 10:15(?) Data Visualization (notebook)

  • break

  • 10:30(?) - 12:00 Record Linkage (lecture, McDonald “An Introduction to Probabilistic Linkage”)

  • 12:00 - 01:00 Lunch

  • 01:00 - 01:45 Guest Lecture, Rick Hendra: Research Evidence on TANF and Employment (slides)

  • 01:45 - 02:45 Data preparation: creating "labels"

  • 02:45 - 03:00 Break

  • 03:00 - 04:00 Data preparation: creating "features"

Monday-Tuesday: 2208 Lefrak Hall, University of Maryland

Nov 5 - Introduction to Machine Learning

  • 09:00 - 12:00 All (almost) of Machine Learning (slides)

  • 12:00 - 01:00 Lunch

  • 01:00 - 02:45 Walk through of example ML prediction model

  • 02:45 - 03:00 Break

  • 03:00 - 04:00 Project discussion: label (outcome) definition and features

Nov 6 - Text Analysis

  • 09:00 - 10:30 Text Analysis (lecture)

  • 10:30 - 10:45 Break

  • 10:45 - 12:00 Text Analysis (notebook)

  • 12:00 - 01:00 Lunch

  • 01:00 - 01:15 Project goals and timeline (slides)

  • 01:15 - 04:00 Project work (exploration, discussion, planning)

WEEK 2: Dec 5-7 & 10-11 (9am - 4pm eastern)

Wednesday-Friday: 1208 Lefrak Hall, University of Maryland

Dec 5 - Machine Learning - methods

  • 09:00 - 12:00 Machine Learning lecture - recap (slides) and methods (slides)

  • 12:00 - 01:00 Lunch

  • 01:00 - 02:00 Project discussion - model set-up

  • 02:00 - 04:00 Project work

Dec 6 - Machine Learning in Practice

  • 09:00 - 12:00 Machine Learning lecture - model selection, evaluation (slides), and bias (slides)

  • 12:00 - 01:00 Lunch

  • 01:00 - 04:00 Project work

Dec 7 - Privacy and Confidentiality

  • 09:00 - 10:30 Privacy & Confidentiality lecture (slides)

  • 10:30 - 10:45 Break

  • 10:45 - 12:00 Disclosure review & export requests

  • 12:00 - 01:00 Lunch

  • 01:00 - 04:00 Project work

Monday-Tuesday: 2208 Lefrak Hall, University of Maryland

Dec 10 - Inference

  • 09:00 - 10:30 Inference lecture (slides)

  • 10:30 - 10:45 Break

  • 10:45 - 12:00 Project work

  • 12:00 - 01:00 Lunch

  • 01:00 - 04:00 Project work

Dec 11 - Final Presentations

  • 09:00 - 09:10 Program recap

  • 09:10 - 11:00 Finalize project presentations

  • 12:00 - 01:00 Lunch

  • 01:00 - 04:00 Final Presentations

Projects

Final project presentations will be held in person on December 11. Each team will have 20 minutes to present followed by 10 minutes of Q&A.

Export requests should be submitted by end of day Monday, December 17.

Final project reports are due Friday, December 21.