Data Science as a Piece of the Puzzle

Sam Edelstein
7 min readJun 6, 2019

--

In my first Economics class at Syracuse University, my professor, Jerry Evensky, talked about something that has always stuck with me: economists tend to make black-and-white recommendations, but economics is only a piece of the larger social sciences puzzle. So while an economist may argue that it is most efficient to perform a task one way, or following a certain process may create the most utility, without understanding the history, sociology, political factors, and others, you aren’t seeing the larger picture.

I ended up majoring in economics largely because of that class, and in the degree I learned a lot about making policy and decisions from a economists point of view. But, remembering Professor Evensky’s insight, I also took a lot of policy, history, and sociology classes to find other pieces of the puzzle.

That advice has continued to stay with me. In my day job as a Chief Data Officer, I am responsible for helping the City use data to make better decisions. Understanding the value data can bring, people may anticipate that following an analysis of a dataset, a clear picture will emerge about the “right” answer. In reality, things are rarely this simple. There are always assumptions made while doing an analysis, there is bias in the data we are using to do the analysis, and often times, the dataset is not complete or there is something wrong with the way it has been inputted.

There are a few things we do to ensure our analyses are just one piece of a larger puzzle.

Ride alongs

Whether it is throwing trash with staff in the sanitation department, filling potholes with the street repair crews, observing a code inspector cite properties for violations, watching water department employees fix a water main break, or the fire department respond to an incident, we always try to watch how work is done. This is for a few reasons:

  • Gives context to the recommendations we will ultimately make — could we be making their job harder?
  • Helps us understand if the recommendations are doable — do they have time to think about our recommendations while doing the work?
  • Exposes other potential opportunities — we might have been observing for a specific reason but another issue comes to light.
  • Builds trust — if we’re willing to get out from behind our desks to watch or participate, it builds a bond and at least lets the staff see who is doing the analysis.

Specifically, we learned that unless we figured out an automated way to track how many potholes were being filled, it would be very challenging to collect data because the job is physically taxing and can be dangerous, plus data entry is not a core job function so any time asked to do reporting is less time filling potholes.

Interviews

Sometimes when presented with a problem and a dataset, the instinct is to dive in, start writing code, and come up with an answer. In reality, this is not reasonable. Most times the data doesn’t make any sense because it might not be documented well, the staff may have found workarounds within frustrating software systems to log their work and those workarounds make things look strange in the dataset, the data may have limitations that only staff know about, and departments each have their own processes that are important to understand first. Interviews help to bring that information to light. In my experience, a lot of front-line staff has never had anyone ask for advice, so taking the time to learn can help the project, and get someone new to feel connected to the process.

It is also critical, especially when working in public service, to get feedback from the public. We work in a bubble. Issues that seem important on a day-to-day basis to city government employees might get a couple minutes of thought from a typical resident if we are lucky. Other issues that might not seem important could be the most important thing to someone in the community. We could spend a lot of time analyzing data about issues that no one in the public cares about, which might not be a great use of time.

Even more importantly, without engaging with the public, we may not understand where bias in our data exists. When we look at service requests from the community (things like potholes, street light outages, etc), without talking to anyone we may think that as long as we are responding to all complaints, we are taking care of the problems. When we talk to people, though, we might learn that some have no idea how to make a service request, or they don’t have time to make the request, or something else entirely. If we analyze data based only on existing complaints and service requests, we might miss entire parts of the city that never got in touch with us. Through interviews, instead of focusing on the data we have, we might need to first focus on the data we don’t have. See this article for more: https://theconcourse.deadspin.com/you-re-not-mapping-rats-you-re-mapping-gentrification-1835005060

Understand best practices

A recommendation based on a data analysis will only work if it is understood in the context of best practices, too. In many of the analyses we do, we first look to see how the work is done elsewhere, but also if there are examples from others about how they performed their own similar analysis and the process they took. Craig Campbell of New York City’s Mayor’s Office of Data Analytics wrote about the potential and challenges of replicating urban analytics use cases here: https://ash.harvard.edu/files/ash/files/290042_hvd_ash_policy_campbell_v2.pdf.

A specific example of looking to use cases that my team and I are working on right now in partnership with the Center for Government Excellence at Johns Hopkins focuses on the way we assess properties. The Cook County Assessor’s Office took on building computer assisted mass appraisal model. We are interested in doing something similar, so understanding the inputs and considerations they used in their model has been very helpful for us. Our model will likely turn out differently, but understanding how others do their work is important. More on their model here: https://datacatalog.cookcountyil.gov/stories/s/p2kt-hk36.

Show and tell

Sometimes we do just dive into an analysis. When the team becomes aware of a new dataset, it’s pretty fun to start seeing what we can fine. My kids ask about my day and say, “Daddy, did you do your data work?” On days with a new dataset, I say, “Yes, kids, I did my data work.”

Even more likely, though, we may not know enough about the questions to ask before looking at the dataset at a very high level. How much happened? Is there more happening today than last month or than at this time last year? How much of each category is present in this dataset? Are there values missing that we would expect to be there? When we produce this kind of information, we can present it to a department for feedback. Jesse Cases on my team did a great job with this, showing a high level analysis of assessment information to the Assessment Department. They gave context and feedback about why we were seeing some information that looked strange to us, they confirmed some of the findings, and they were surprised about some other information.

Exploratory data analysis turns out to be a great conversation starter. With that information, we can then have deeper interviews and understand best practices better.

Making data open

Thinking back to the point about us as city employees being in a bubble, we should enable others to do the analyses, too. Making data open and available to students, researchers, and the public allows for more perspectives on the information. We might naturally look city-wide at an issue. A resident might look more closely at a specific neighborhood where they know the issues better.

We did a Snow Safety Summit event where we both collected data from the public and showed them data so they could react. This gave us a great perspective about what they thought about when it snowed, and how they wanted the City to respond. There was a lot of concern around illegally parked cars that prevented plows from clearing roads. We used that information, which we might not have realized was as big of an issue, and built a model to predict which streets were most likely to be blocked by illegally parked cars.

When we release data, we also do our best to put it in context, both with good documentation, but also some basic visualizations so if someone has not idea how to look at a large dataset, but can understand a map or a chart, they can give feedback and make insights, too.

Creating a data-driven government is something that I believe is important. But more appropriately, as my friend Brian Elms would point out (and once wrote on my white-board), we should actually be creating a data-informed government. One where data is a critical input to decision making, but is only one piece of the puzzle.

Someone majoring in economics should learn history, sociology, and more to get a more complete picture. A data analyst should take a similar approach and get context from first-hand observation, interviews, research, and best practices.

--

--

No responses yet