Applied Data Analytics Program Online Introduction to Python & SQL
Instructor
Brian Kim, kimbrian@umd.edu
Introduction
This pre-course introduces students to the basics of Python and SQL for data analysis. Students will explore real publicly-available datasets, using the data analysis tools in Python to create summaries and generate visualizations. Students will learn the basics of database management and organization, as well as learn how to code in SQL and work with SQL databases. By the end of the class, students should understand how to read in data from CSV files or from the internet and be comfortable using either SQL or Python to aggregate, summarize, describe, and visualize these datasets.
Accessing Course Materials
In this pre-course, we have two main sources of course materials. For the main workbooks to practice and learn how to code, we will use Binder, an in-browser environment that you can use to write code immediately. You don't need to install anything, but you cannot save any of your work in this environment, so if you choose this option, you must save all your work separately. This means copy and pasting all work and answers, or downloading them from Binder.
We will be doing all of the coding instruction and exercises using Jupyter notebooks. You can find instructions on how to use Jupyter notebooks, as well as information about the structure of the pre-course, in the Introduction Videos page. Please make sure to watch them before starting the work in Unit 1.
Course Content
The Schedule and Links page contains all of the links to the videos and Binder environment for each unit. We will go over one unit each week, spending two weeks on Python and two weeks on SQL. Please note the date and times for the online meetings -- these are when you can ask questions and get help on the workbooks.
Data Documentation
In the pre-course, we will be using real data in the form of LEHD Origin-Destination Employment Statistics (LODES) data. The documentation for the data that we will use is provided below and includes descriptions of the variables as well as information about how to find data for different states.
Additional Resources
There are many resources available on the internet to help you learn Python and SQL. Listed below are some of the most useful ones for this course.
Cheatsheets:
Pandas: https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf
Python: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonForDataScience.pdf
PostgreSQL: http://www.postgresqltutorial.com/wp-content/uploads/2018/03/PostgreSQL-Cheat-Sheet.pdf
Documentation and Tutorials:
Pandas: http://pandas.pydata.org/pandas-docs/stable/
Matplotlib: https://matplotlib.org/tutorials/index.html