Applied Data Analytics Training Program- TDC, College Park, MD 2019



Program Content

In the TANF Data Collaborative applied data analytics training program, participants will work in teams to define and complete a project related to TANF and employment. The program will provide an up-to-date perspective on the use of TANF recipient data and wage records to inform policy analysis and program operations. It will also provide hands-on instruction of using microdata in SQL and Python for the following tasks: data management, record linkage, data visualization, and machine learning. This program is supported by the Administration for Children and Families. Consult tanfdata.org for more information.

The first training module will take place September 18-20 and the second on October 16-18, both in College Park, Maryland.

Program participants and team assignments

Table of contents:

COMMUNICATIONS

We distribute information and communicate within the group via this website, e-mail, and Slack. In general, the instructor team will respond quickly to either e-mail or Slack messages; however, we tend to prefer Slack for technical issues and sharing snippets of information.

E-mail addresses

Slack

We use Slack extensively and it is often the best way to get in touch with us, the team is coleridge-initiative.slack.com and you will receive invitations to join when the Online Introduction material begins (if you have not please let us know). If you are unfamiliar with Slack, it has different "channels" to help organize conversations; previous classes have used Slack in various ways (eg for a channel for Python specific questions), but the two primary channels we expect to use are (i) the class channel for general class discussion and sharing documents and (ii) the "adrf-tech-support" channel for any technical support in accessing the ADRF.

Pre-Course Online Introduction MATERIAL

Our collaborators at the University of Maryland and the University of Mannheim have prepared introductory material in Python and SQL for you to work through prior to the program. The material is presented in a four-week, short course format; two weeks for SQL and two for Python. We expect the material for each week to take a maximum of 4 hours total to work through. 

DATA DOCUMENTATION

The data providers, Coleridge team, and collaborators have created the below documentation for the datasets to be used in this program.

Additionally, the (Beta!) ADRF Explorer has dataset documentation: https://ds.adrf.cloud (you will need to log in with your ADRF credentials).

Lit review sources

  • F. Andersson, H. J. Holzer, J. I. Lane, Moving Up Or Moving On: Who Gets Ahead in the Low-Wage Labor Market? (Russell Sage Foundation, 2005).

  • F. Andersson, H. J. Holzer, J. Lane, in Studies of labor market intermediation (University of Chicago Press, 2009), pp. 373–398.

  • H. David, S. N. Houseman, Do temporary-help jobs improve labor market outcomes for low-skilled workers? Evidence from" Work First". Am. Econ. J. Appl. Econ. 2, 96–128 (2010).

  • “The Promise of Evidence-Based Policymaking: Report of the Commission on Evidence-Based Policy” (Washington, D.C.).

  • B. D. Meyer, J. X. Sullivan, “Measuring the well-being of the poor using income and consumption” (National Bureau of Economic Research, 2003).

  • B. D. Meyer, W. K. C. Mok, J. X. Sullivan, “The under-reporting of transfers in household surveys: its nature and consequences” (National Bureau of Economic Research, 2009).

Additional report sources

Site Logistics

Training location: 2208 LeFrak Hall, 7251 Preinkert Dr, College Park, MD 20742

WEEK 1: September 18-20 (9am - 4pm eastern)

September 18 - Introduction to the program (textbook chapter 1)

  • 8:30 Arrive and get settled

    • Hand in hard copies of signed data agreements

  • 9:00 Welcome & introductions (slides)

  • 9:45 Overview & Motivation

  • 10:30 Break

  • 10:45 Intro to projects (slides)

  • 11:45 Connect to the Administrative Data Research Facility (ADRF)

  • 12:00 Lunch

  • 1:00 Guided exploration of the ADRF

  • 1:30 Data exploration

  • 3:00 Team project discussions

  • 3:45 Preview for day 2

  • 4:00 Break for the day

September 19 - Dataset Exploration, Data Visualization (chapter 9), and Record Linkage (chapter 3)

  • 9:00 9:00 Overview of day 2 (slides)

  • 9:15 Datasets, Entities, and Databases

  • 9:45 TANF data orientation (Emily Wiegand from Chapin Hall - slides)

  • 10:30 break

  • 10:45 Project & self-directed data exploration

  • 12:00 lunch

  • 1:00 Data Exploration (notebook)

  • 1:30 Data Visualization - lecture (slides) & notebook (companion video)

  • 2:45 break

  • 3:00 Project & self-directed data exploration

  • 3:45 Recap & preview of tomorrow

  • 4:00 Break for the day

September 20 - Analysis & Intro to Machine Learning (chapter 4, chapter 6)

  • 9:00 Intro to Day 3 (day 2 feedack)

  • 9:5 Principles of ML (slides)

  • 10:30 Break

  • 10:45 ML continued

  • 12:00 Lunch

  • 1:00 ML pipeline: set up data and run model (notebooks) - ML project cheatsheet

  • 2:45 Some methods and recap (lecture)

  • 3:15 Break

  • 3:30 Project planning time

  • 4:00 Break for the day

WEEK 2: October 16-18 (9am - 4pm eastern)

October 16 - Inference (chapter 10)

  • 9:00 welcome back and overview of week 2

  • 9:05 Project work

  • 10:30 Project status & summaries 

  • 11:00 Inference (slides)

  • 12:00 Lunch

  • 1:00 Record Linkage (slides, companion video)

  • 2:00 Project work

  • 4:00 Break for the day

October 17 - Project work

October 18 - Privacy and Confidentiality (chapter 12)

  • 8:30 Disclosure review & Export request

  • 9:15 Privacy & Confidentiality (slides)

  • 10:30 Project work

  • 12:00 Lunch

  • 1:00 Team project status update

  • 2:00 Project work

  • 3:15 Program closing

Projects - presentations november 13

Project completion: Submit at least an initial export request by the end of the day on Monday, October 28. All export requests due by the end of the day Friday, November 8.

Final presentations will be held November 13th via a Zoom Webinar video conference (details will be added below). Each team will have 20 minutes to present followed by 10 minutes of Q&A. Please send your presentations (either as PowerPoint or PDF) to us at dataanalytics@coleridgeinitiative.org to be posted to this page by November 12.

Final project reports are due November 15th.

Zoom Webinar details

Zoom Webinars allow for “Panelists” and “Attendees”. Panelists can share their screens to show presentations and be heard, while Attendees can only view and listen. The Coleridge team will add all participants to the Zoom Webinar as “Panelists”. The Attendee link to view webinar is here: https://nyu.zoom.us/j/661925825 - this is the link you should share with anyone you want to invite to view the presentations.

Presentation schedule (Eastern time) - Nov 13