Short-course: Introduction to Big Data for Social Science
FRAUKE KREUTER Professor, University of Maryland and University of Mannheim, co-founder of the Coleridge Initiative CHRISTOPH KERN Post-Doctoral Researcher, University of Mannheim, JULIA LANE, Professor, New York University and co-founder of the Coleridge Initiative, BENJAMIN FEDER, Research Scientist, the Coleridge Initiative
Dates: August 10-21
COURSE ABSTRACT
Social scientists and survey researchers are increasingly faced with integrating data from multiple data sources and are expanding their activities beyond experiments and surveys. This course will discuss these developments and provide practical guidance on combining methods and tools from computer science, statistics, and social science. In particular, techniques such as web scraping and machine learning applications will be discussed, but also the linkage of various records, and privacy issues and questions around informed consent. The course presents the key big data tools in a non-intimidating way to social and data scientists while not neglecting research questions and purposes. The course aims to illustrate social science and data science principles through real-world applications, and links computer science concepts to real social science as well as survey research. Portions of the course are based on parts of a recent textbook with the same title. Data and code for practice are available here.
The viewing of the video lectures is self-paced. The exercises should be completed at the end of each day, and links to the corresponding chapters in the Big Data for Social Science textbook have been attached to each day. Attendance at the 2-hour live online discussion session is mandatory; participants should expect to answer questions about the exercises.
The exercises will refer both to the videos and/or to one of the three following papers:
Here is a flyer for an upcoming federal interagency virtual conference that may be of interest.
Program schedule
Week 1:
Monday: Welcome and Overview
Introduction to Big Data (textbook chapters 1 and 10) and Big Data Toolbox (optional textbook chapters 4 and 5)
Tuesday: Web-scraping and APIs (textbook chapter 2)
Wednesday: Data Visualization (textbook chapter 6)
Example from Frauke
Thursday: Foundations of Machine Learning (textbook chapter 7)
Friday: 2 hour live online discussion session 10:30am--12:30pm Eastern: review and Q&A
Zoom information:
URL link: https://coleridgeinitiative-org.zoom.us/j/98972936878?pwd=MXdpbTNsaWEzZmhOSU55UFd2VzBodz09
Meeting ID: 989 7293 6878
Meeting password: 360414
Week 2:
Monday: Record Linkage (textbook chapter 3)
Tuesday: Machine Learning II (textbook chapter 7)
Wednesday: Text Analysis (textbook chapter 8)
Thursday: Privacy and Confidentiality (textbook chapter 12)
Friday: 2 hour live online discussion session 10:30am--12:30pm Eastern: review and Q&A (Zoom chat history)
Zoom information:
URL link: https://coleridgeinitiative-org.zoom.us/j/98951277732?pwd=eUkvaUNjY0dONFFEUXVZMXVKVXNnUT09
Meeting ID: 989 5127 7732
Meeting password: 098294
FRAUKE KREUTER is Professor in the Joint Program in Survey Methodology at the University of Maryland, Professor of Methods and Statistics at the University of Mannheim, and head of the statistical methods group at the German Institute for Employment Research in Nuremberg. Previously, she held positions in the Department of Statistics at the University of California, Los Angeles, and the Department of Statistics at the Ludwig Maximillian’s University of Munich. Prof. Kreuter serves on several advisory boards for National Statistical Institutes around the world, and within the Federal Statistical System in the United States. She is also a Gertrude Cox Award winner, which recognizes statisticians in early- to mid-career who have made significant breakthroughs in statistical practice, winner of the inaugural Links Lecture Award and elected fellow of the American Statistical Association. Additionally, she is cofounder of the Coleridge Initiative, founder of the International Program in Survey and Data Science, and co-host of the digitalization podcast www.digdeep.de.
CHRISTOPH KERN is a Post-Doctoral Researcher at the Professorship for Statistics and Methodology at the University of Mannheim, and Visiting Assistant Professor in the Joint Program in Survey Methodology at the University of Maryland. Christoph Kern graduated in 2011 (Dipl.-Soz.-Wiss.) from the University of Duisburg-Essen (UDE), where he studied sociology with a focus on methods for empirical social science research. In 2016 he completed his PhD (Dr. rer. pol.) at UDE. His current research focuses on the usage of machine learning methods in survey research.
JULIA LANE is a Professor at the NYU Wagner Graduate School of Public Service and an NYU Provostial Fellow for Innovation Analytics. She cofounded the Coleridge Initiative, whose goal is to use data to transform the way governments access and use data for the social good through training programs, research projects and a secure data facility. The secure facility was initially built at the behest of the Census Bureau to inform the decision-making of the Commission on Evidence Based Policy. In these positions, Julia has led many initiatives, including co-founding the Institute for Research and Innovation in Science (IRIS) at the University of Michigan and STAR METRICS programs at the National Science Foundation and establishing the PatentsView project at the US Patent and Trademark Office. She also initiated and led the creation and permanent establishment of the Longitudinal Employer-Household Dynamics Program at the U.S. Census Bureau. This program began as a small two year ASA Census Bureau fellowship and evolved into the first large-scale linked employer-employee dataset in the United States. It is now a permanent Census Bureau program with appropriated funds of $11 million per year. Julia is an elected fellow of the American Association for the Advancement of Science, the International Statistical Institute and a fellow of the American Statistical Association. She is the recipient of the 2014 Julius Shiskin award and the 2014 Roger Herriot award. Julia is also the recipient of the 2017 Warren E. Miller Award. She holds a PhD in Economics and an MA in Statistics.
BENJAMIN FEDER is an Assistant Research Scientist at the Coleridge Initiative. Since his start in August 2019, Mr. Feder has specialized in developing training materials for the Applied Data Analytics program and is also a member of the Data Export team. Mr. Feder received his B.S. in Statistical Science from Duke University in May 2019.