Syllabus

Overview and Goals

This course is intended to provide students experience with data science within a framework of data ethics in service of equity-oriented public policy. Our primary goals are:

  • Make progress on projects that advance social justice and policy understanding in collaboration with community partners and create work you can point to as part of your portfolio.
  • Practice working with real data (that is, messy data resulting from policy administration) to answer pressing questions with attention to the moral and ethical implications of our work. This includes finding, cleaning, and understanding data; exploring, analyzing, modeling data; visualizing, contextualizing, and communicating data; with care and humility and respect for the affected partners and communities throughout.
  • Develop experience in data workflows that support ethical data science, including processes for working collaboratively, openly, inclusively, and reproducibly.

Spring 2023 Projects

This spring, we’ll complete two projects.

Project 1: Stepping Stones Report

In the first half of the semester, while we’re learning more about R, reproducible data workflows, and ethical data practices, we’ll contribute to work for the City of Charlottesville to update and enrich a community “Stepping Stones” report, a collection of 39 metrics to gauge well-being in Charlottesville and Albemarle. Working in teams, we’ll divide up the metrics and build scripts to download updated measures, process them and combine them with the past data series, visualize and better contextualize changes over time. If we do our work well, it will make future iterations of this report more transparent and easier to generate, as well as provide additional contextualization and explanation for the metrics.

Project 2: Multiple Equity Center Projects

In the second half of the semester, we’ll divide up into new teams each working on a different project hosted by The Equity Center. These include projects centered on

  • Court fees in eviction cases, based on Virginia Civil Court data
  • Police stops, based on the Virginia Community Policy Act data collection
  • Local rental markets, based on local tax assessment data
  • Humanizing the Orange Dot report, using using American Community Survey public-use microdata to build family profiles
  • Stepping Stones supplement, looking at racially disaggregated measures
  • And one more we’re still pinning down

How We’ll Work

In this class, we’ll be learning how to use data to answer public-minded questions with an emphasis on justice and equity. You don’t need to be an expert in statistics or R or visualization or coding coming in; the projects and assignments will provide opportunities to develop these skills further. Good data-oriented work comes from the collaboration of people with diverse talents, perspectives, and expertise. We need individuals interested in data wrangling and analysis, working in R, and visualizing data effectively; but we also need contributors with an understanding or interest in the various project topics, who can ask probing questions and think creatively; and members who can help teams work better and keep projects moving, who can dig into a new substantive area to synthesize information, and who can communicate about our work effectively and help us convey people-centered stories and explanations. That doesn’t mean everyone won’t be responsible for learning about and contributing to each step of the project, but you may find that at some points you are more of a learner and at other points you are more of a leader.

R, RStudio

We will be using the programming language R for our data work. R is a free, open-source language (😄) most commonly used with RStudio, a free user interface (❤️). R users are an amazing bunch and have generated a variety of free learning resources on which we’ll rely (🎉).

Instructions for installing R, RStudio, and all the tidyverse packages.

Materials

As part of our ethic, we’ll center free resources and materials licensed by the library for our use. Consequently, all reading materials will be available at no cost. Data practice resources and references are online; readings on data ethics and power will be made available on Perusall (and many of these are also available online).

Getting Help and the Class Slack

Computers are stupid which can make programming hard. Any little error – a small typo, a missed closing parentheses – can cause inordinate pain (especially if you’re tired). Some of that is unavoidable when learning a new language. But don’t beat your head against the wall when you get stuck. Seek help!

First, we have a class chatroom at Slack where anyone in the class can ask questions and anyone can answer. You’re encouraged to do both and you should check the slack regularly. It will be our primary means of communication. If you have questions, someone else probably does too, so slack will help us share knowledge widely. You’ll likely be able to answer others’ questions, and you should! That’s part of our collaborative learning.

There are many online resources. Two of the most useful are StackOverflow (a Q&A site with hundreds of thousands of answers to all sorts of programming questions) and RStudio Community (a forum specifically designed for people using RStudio).

The UVA Library also an amazing team of consultants in the StatLab. They’ve got online how-tos you might find useful (including this one on how to ask reproducible questions on StackOverflow or the RStudio Community) and you can email them at statlab@virginia.edu to set up a consultation (you shouldn’t use them for your problem sets, but they’re a great resource if you’re looking for help on the troubleshooting your project analysis or after our class is done).

How We’ll Evaluate Our Work

Grades will depend on your learning and contribution to our collective effort, which will be assessed by a combination of individual and team-based work.

Reading Annotations, 20%

To facilitate understanding and discussion we’ll all contribute to collaborative annotation and commenting on the key reading (the data ethics content) using Perusall. I will review these ahead of class to help organize our discussion; which means these are due by 2pm on Thursdays.

We have reading assigned on nine out of 13 weeks (minus the first week). Contributing annotations to readings for at least 7 weeks is necessary for full credit; more specifically, you should plan to contribute annotations to at least two weeks in each of these three week periods: 1/26, 2/2 and 2/9; 2/16, 2/23 and 3/2; 3/16, 3/23 and 3/30.

For each week you contribute annotations, add at least four notes for a given day’s reading. A full contribution goes beyond agreeing with a point or asking a question. It will engage a point, connect it to other ideas we might learn from, amend it to suggest a more limited or wider scope, note conditions under which it is more or less relevant, suggest how it might inform your project or other work of which you are aware, ask a question and provide an initial answer. If others have already commented on a passage and you want to address and elaborate on their comments (beyond basic agreement), you should. Think about the kinds of things you might raise in a discussion and bring them up in the annotations.

Individual Data/R Assignments, 20%

In the first five weeks, individuals will complete brief assignments to become more comfortable using R (and to gain some exposure to elements of project 2 data sources). If you get stuck, feel free to ask for help on slack so that the learning can be shared. If you’re able to answer others’ questions, jump in; it’s a great chance to deepen your own understanding by explaining ideas.

Collaborative Learning, 10%

Learning new things is tough. Supportive colleagues can make it easier. For this reason, you will be evaluated based on the extent to which you help each other learn through our in-class discussions, our team practice and opportunities for feedback, and our class slack.

As team projects for the second half of the semester develop, each group will provide formal feedback in response to another group’s progress reports – sharing ideas, making suggestions, identifying elements that need more clarification or thought, and proposing additional steps.

Project 1: Stepping Stones Report, 20%

In the first half of the class, we’ll work in teams, dividing up the Stepping Stone report so that each team implements the work to update and deepen a section of the report.

Work will include

  • Weekly progress, 5%: Each team will take on responsibility for a different set of metrics within the report, building a workflow to download recent measures, do any necessary pre-processing of the measures, and append them to the historical data. In addition, each team will explore key changes in the relevant local or policy environment that help contextualize the data. Finally, teams will create an R markdown file to read in the updated measures, visualize them appropriately and incorporating important context, and provide brief narration of each metric (source, importance, changes, etc.). Weekly progress targets are outlined further in the project pages.
  • Final code submission, 12.5%: Each team will submit a readable, replicable, and extendable script for data acquisition and preparation and an R markdown file that visualizes the measures you’re responsible for, incorporating contextualization into the visuals where appropriate, documenting the data sources, and providing additional narrative explanation.
  • Contribution memo, 2.5%: As part of the final submission, each member of a team will complete a contribution memo describing their own and their teammates contributions to the final project.

Project 2: Multiple Equity Center Projects, 30%

In the last half of the class, we’ll begin working in new groups to complete data wrangling, visualization, and analysis to build data stories centered on selected questions. Once teams have been formed, I expect groups to meet each week between class to work together (via zoom, through chat in slack or a preferred platform).

Work will include

  • Weekly progress, 7.5%: Each team will take on responsibility for a different research question, exploring the relevant parts of the data set, setting up the data to address those questions, creating visuals and comparisons, identifying and annotating literature (articles, reports, etc.) that informs your analysis, interpretation and contextualization of the data. Groups will submit updates reporting on their progress and showing their work each week. Based on that progress, we’ll collectively determine the next steps. Though I have some weekly targets in mind (outlined in the projects page), applied research and data analysis doesn’t always conform to planned timelines. In part, that’s what makes it exciting, but it will require some flexibility on all our parts.
  • Analysis in an R markdown file, well-documented data wrangling code, and a final presentation, 20%: Each team will complete a report of their work as an R markdown and knitted html files suitable for inclusion the class webpage. This should, at a minimum, outline the key question they’ve been addressing and how the literature intersects with the questions, communicate their process and highlight concerns and caveats, and explain their findings in an a way accessible to a broader community. In addition, teams will submit a well-documented script that implements any pre-processing or data wrangling that precede the final analysis.
  • Contribution memo, 2.5%: As part of the final submission, each member of a team will complete a contribution memo describing their own and their teammates contributions to the final project.

Pre-Class Tasks

Once you have read the syllabus and have R, RStudio, and the tidyverse set up on your computer (instructions for installing R, RStudio, and all the tidyverse packages) OR you have created an Posit/RStudio Cloud Free account and installed the tidyverse packages, email me at mclaibourn@virginia.edu with

  1. Whichever answers below describe your prior relationship with R:
    • It’s one of my top 5 favorite letters
    • I’ve admired it from afar
    • I’ve run a few lines of code in R (e.g., provided by others and tweaked)
    • I’ve written R code from scratch (e.g., without following a template)
    • I’m comfortable doing basic data manipulation and analysis in R
    • I’m comfortable doing advanced data manipulation and analysis in R
    • I’m an R masteR and should probably be TA’ing this course
  2. An image or gif of animals at play. Really, anything that makes you smile (😃).

In response, I will invite you to the class slack!