Applied Data Analytics Training Program- TDC, College Park, MD 2019
Program Content
In the TANF Data Collaborative applied data analytics training program, participants will work in teams to define and complete a project related to TANF and employment. The program will provide an up-to-date perspective on the use of TANF recipient data and wage records to inform policy analysis and program operations. It will also provide hands-on instruction of using microdata in SQL and Python for the following tasks: data management, record linkage, data visualization, and machine learning. This program is supported by the Administration for Children and Families. Consult tanfdata.org for more information.
The first training module will take place September 18-20 and the second on October 16-18, both in College Park, Maryland.
Program participants and team assignments
Table of contents:
Training program
Project presentations - Final presentations will be held via video conference.
COMMUNICATIONS
We distribute information and communicate within the group via this website, e-mail, and Slack. In general, the instructor team will respond quickly to either e-mail or Slack messages; however, we tend to prefer Slack for technical issues and sharing snippets of information.
E-mail addresses
Group email account
dataanalytics@coleridgeinitiative.org - the group email for program instructors and coordinators; general program, logistics and ADRF system questions should be directed here (ADRF is our computing platform, see the computing environment page for more info)
For ADRF tech support please email support@adrf.zendesk.com
Report all security incidents or suspected incidents (e.g., lost passwords, improper or suspicious acts) related to ADRF to security-adrf@nyu.edu
Instructor email addresses:
Julia Lane - julia.lane@nyu.edu
Rayid Ghani - rayidghani@gmail.com
Brian Kim - kimbrian@umd.edu
Clayton Hunter - clayton.hunter@nyu.edu or clayton@coleridgeinitiative.org
Ben Feder - baf7@nyu.edu
Slack
We use Slack extensively and it is often the best way to get in touch with us, the team is coleridge-initiative.slack.com and you will receive invitations to join when the Online Introduction material begins (if you have not please let us know). If you are unfamiliar with Slack, it has different "channels" to help organize conversations; previous classes have used Slack in various ways (eg for a channel for Python specific questions), but the two primary channels we expect to use are (i) the class channel for general class discussion and sharing documents and (ii) the "adrf-tech-support" channel for any technical support in accessing the ADRF.
Pre-Course Online Introduction MATERIAL
Our collaborators at the University of Maryland and the University of Mannheim have prepared introductory material in Python and SQL for you to work through prior to the program. The material is presented in a four-week, short course format; two weeks for SQL and two for Python. We expect the material for each week to take a maximum of 4 hours total to work through.
DATA DOCUMENTATION
The data providers, Coleridge team, and collaborators have created the below documentation for the datasets to be used in this program.
DB schema overview: schema cheatsheet
Illinois Department of Human Services (IDHS): documentation link (“il_dhs” schema)
Illinois Department of Employment Services: Documentation link (“il_des_kcmo” schema)
Indiana Department of Workforce Development: data dictionary (“in_dwd” schema)
Indiana Family and Social Services Administration (“in_fssa” schema)
Chapin Hall data model, created from the “tanf_adult”, “tanf_unioned_child”, and “tanf_family” raw tables
Chapin Hall’s TDC schema.table connection diagram
Additionally, the (Beta!) ADRF Explorer has dataset documentation: https://ds.adrf.cloud (you will need to log in with your ADRF credentials).
Lit review sources
F. Andersson, H. J. Holzer, J. I. Lane, Moving Up Or Moving On: Who Gets Ahead in the Low-Wage Labor Market? (Russell Sage Foundation, 2005).
F. Andersson, H. J. Holzer, J. Lane, in Studies of labor market intermediation (University of Chicago Press, 2009), pp. 373–398.
H. David, S. N. Houseman, Do temporary-help jobs improve labor market outcomes for low-skilled workers? Evidence from" Work First". Am. Econ. J. Appl. Econ. 2, 96–128 (2010).
“The Promise of Evidence-Based Policymaking: Report of the Commission on Evidence-Based Policy” (Washington, D.C.).
B. D. Meyer, J. X. Sullivan, “Measuring the well-being of the poor using income and consumption” (National Bureau of Economic Research, 2003).
B. D. Meyer, W. K. C. Mok, J. X. Sullivan, “The under-reporting of transfers in household surveys: its nature and consequences” (National Bureau of Economic Research, 2009).
Additional report sources
Making Data Work for Families: Expanding impactful use of data with the Family Self-Sufficiency Data Center and TANF Data Innovations
https://www.chapinhall.org/project/making-data-work-for-families/
https://www.chapinhall.org/project/administrative-data-for-the-public-good/
https://www.chapinhall.org/project/studies-explore-use-of-administrative-data/
Site Logistics
Training location: 2208 LeFrak Hall, 7251 Preinkert Dr, College Park, MD 20742
WEEK 1: September 18-20 (9am - 4pm eastern)
September 18 - Introduction to the program (textbook chapter 1)
8:30 Arrive and get settled
Hand in hard copies of signed data agreements
9:00 Welcome & introductions (slides)
9:45 Overview & Motivation
10:30 Break
10:45 Intro to projects (slides)
11:45 Connect to the Administrative Data Research Facility (ADRF)
12:00 Lunch
1:00 Guided exploration of the ADRF
1:30 Data exploration
3:00 Team project discussions
3:45 Preview for day 2
4:00 Break for the day
September 19 - Dataset Exploration, Data Visualization (chapter 9), and Record Linkage (chapter 3)
9:00 9:00 Overview of day 2 (slides)
9:15 Datasets, Entities, and Databases
9:45 TANF data orientation (Emily Wiegand from Chapin Hall - slides)
10:30 break
10:45 Project & self-directed data exploration
12:00 lunch
1:00 Data Exploration (notebook)
1:30 Data Visualization - lecture (slides) & notebook (companion video)
2:45 break
3:00 Project & self-directed data exploration
3:45 Recap & preview of tomorrow
4:00 Break for the day
September 20 - Analysis & Intro to Machine Learning (chapter 4, chapter 6)
9:00 Intro to Day 3 (day 2 feedack)
9:5 Principles of ML (slides)
10:30 Break
10:45 ML continued
12:00 Lunch
1:00 ML pipeline: set up data and run model (notebooks) - ML project cheatsheet
2:45 Some methods and recap (lecture)
3:15 Break
3:30 Project planning time
4:00 Break for the day
WEEK 2: October 16-18 (9am - 4pm eastern)
October 16 - Inference (chapter 10)
9:00 welcome back and overview of week 2
9:05 Project work
10:30 Project status & summaries
11:00 Inference (slides)
12:00 Lunch
1:00 Record Linkage (slides, companion video)
2:00 Project work
4:00 Break for the day
October 17 - Project work
9:45 Project work
11:30 Send in project status write-ups (suggest using ML project cheatsheet)
12:00 Lunch
1:00 Project discussions with Chapin Hall (recording)
2:00 Project work
4:00 Break for the day
Happy hour at College Park Marriott
October 18 - Privacy and Confidentiality (chapter 12)
8:30 Disclosure review & Export request
9:15 Privacy & Confidentiality (slides)
10:30 Project work
12:00 Lunch
1:00 Team project status update
2:00 Project work
3:15 Program closing
Projects - presentations november 13
Project completion: Submit at least an initial export request by the end of the day on Monday, October 28. All export requests due by the end of the day Friday, November 8.
Final presentations will be held November 13th via a Zoom Webinar video conference (details will be added below). Each team will have 20 minutes to present followed by 10 minutes of Q&A. Please send your presentations (either as PowerPoint or PDF) to us at dataanalytics@coleridgeinitiative.org to be posted to this page by November 12.
Final project reports are due November 15th.
Zoom Webinar details
Zoom Webinars allow for “Panelists” and “Attendees”. Panelists can share their screens to show presentations and be heard, while Attendees can only view and listen. The Coleridge team will add all participants to the Zoom Webinar as “Panelists”. The Attendee link to view webinar is here: https://nyu.zoom.us/j/661925825 - this is the link you should share with anyone you want to invite to view the presentations.
Presentation schedule (Eastern time) - Nov 13
12:00 - 12:30pm Team 1 - What Factors Predict Full-time, Minimum Wage Employment for TANF leavers?
12:30 - 01:00pm Team 2 - Which Participant Characteristics Lead to a Successful TANF leaver?
01:00 - 01:30pm Team 3 - Stable Employment Predictors
01:30 - 01:45pm Break
01:45 - 02:15pm Team 4 - Which Demographic or Policy Factors Increase the Risk of Returning to TANF?
02:15 - 02:45pm Team 5 - What Characteristics Increase an Individual's Risk of Returning to TANF in Illinois Within One Year?
02:45 - Closing remarks
Nov 15, 2:00pm Team 6 - Characteristics of TANF Leavers at Risk of Not Finding Stable Employment