Data Products and Treating Data as a Product

Sam Edelstein
11 min readMar 24, 2023

--

Summary according to ChatGPT

  • There are two broad categories when it comes to data products: treating data as a product and actually building data products.
  • Treating data as a product involves shifting focus away from service-oriented work, interacting and understanding users deeply, planning and reviewing work in the style of a product manager, and being purposeful about building tools that bring value to the broader organization.
  • When building data products, it is important to obsess over the industry, play with and investigate data, interview and build relationships with stakeholders, let the data team talk to stakeholders, and prototype a lot.

There’s been a lot written about data products over the last couple of years. As I’ve read these perspectives and then thought about my own current and prior work, there are two broad categories that often get addressed: treating data as a product and actually building data products. In some ways, these are one-in-the-same, but the posts rarely go into detail about what the endpoint actually looks like, and at least for me, it is hard to grasp how to make things actionable.

Treating Data as a Product

The concept of treating data as a product is exciting and has been a core focus for many blog posts and guides — an early post on this topic was “running your data team like a product team” by Emilie Schario and Taylor Murphy. These kinds of posts typically focus on: shifting focus away from service oriented (IT-help desk style) work, interacting and understanding users deeply, planning and reviewing work in the style of a product manager, and being purposeful about building tools that bring value to the broader organization.

As I think about my prior and current work, a lot of these ideas resonate and are exciting. Though I might not have ever hired (or held the job title of) data product manager, there are a lot of aspects of the work where I or someone on my team has served that role. Here are some things I’ve found useful and necessary when learning, understanding, and ultimately scoping out what to build:

  • Obsess over the industry in which you work — When I worked in higher ed, I learned as much as I could about how universities operate, how their alumni engage, and how Syracuse University was the same or different. In municipal government, I learned as much as I could about street paving, water mains, government organizations, procurement, and more. Now, I am deep into understanding how investments happen in start ups, what considerations should be made, what data is useful, and more. Some of this might have to do with data, but a lot of it has to do with building connection points and empathy with the folks you’ll talk to and work with. Yes, we love dbt, dashboards, and data warehouses, but we have to also care deeply about where we work and what we are doing to generate the best understanding of what people need.
  • Play with and investigate the data on your own — depending on the situation, this step might come before or after an interview, but I find it important to look at the data your stakeholders also use or produce. Data, and the analysis that goes with it, helps uncover process issues related to handoffs or bottlenecks (why is this specific thing taking longer than every other part of the process?). Even if you have no answer, and even if your initial analysis is wrong because you lack full context, I’ve found doing this kind of analysis and interacting with the data itself can prompt better questions. You can say in your interviews, “I noticed this thing in my analysis, what do you think is going on?” Either, the person you are interviewing also doesn’t know, or they will know. In both cases, the interviewee hopefully will get a sense that you care about their work and are willing to do some pre-work to try to learn about their process first.
  • Interviews and build a relationship — getting to know your stakeholders/users/clients and the way they do their work is critical. You’ve hopefully done your research and looked at the data, but your interviews are important for two key reasons. First, they are going to help you answer questions that you can’t figure out otherwise, and their perspective, ultimately is the one that will matter most. Second, if you are going to convince someone to use the outputs/recommendations of your data product, they need to trust you. If they don’t then how will they trust your recommendations?
    With the City, I’d been on a ride-along with sanitation workers. They told me a story about a previous administration that brought in a consultant to analyze and recommend new sanitation routes, only the consultants didn’t ask about times where sanitation workers like to avoid certain areas (and might purposely take inefficient routes) like staying away from schools when they open for the morning — because driving a multi-ton truck near a bunch of small children can be challenging and dangerous.
    When we worked with the street repair department, we learned about how they try to approach street repaving — understanding how they will prioritize streets in need of pushing that are closer together because it is so cumbersome to move the paving equipment around the city. This helped us recommend street paving largely on a worst-to-best cycle, but with a balance for what we’d heard in interviews.
  • Let the data team talk to stakeholders — as I’ve observed some product development happen, there’s often an instinct to say that we should just let the technical people to the technical things and not force them to talk to the stakeholders. While that may be true for software developers (I have no idea, I’m not one), generally folks on the data team need to talk to folks within the rest of the organization. Why? Because data is a pain in the neck to understand. There is so much nuance in how data is created, how people understand it, how they want to see it. It is critical to hear this first hand. Having someone collect and translate requirements will almost always fail.
    Data often reflects process. Observing the process first hand is critical to any data project. Fair if not all data folks want to talk to stakeholders, and generally on a multi-person project it may make sense to split up the roles, but you should be defaulting to interaction instead of away from it.
  • Prototype a lot — Building usable products and tools with data is extremely difficult. You will absolutely not get the answer right the first time, and if you try to, you will likely have taken too much time. I believe that as long as you feel like what you are showing is defensible (you’ve done your research, interviews, have relative confidence in the quality and look of your data), you should be ready to show your work, with the qualifier that the work is not done yet. You are not building a thing for someone, you are building it with them, and as such, they need time to reflect and give feedback frequently. For a dashboard, you might literally start with a drawing on a piece of paper, then you might create some graphs in Excel, then you might build an interactive dashboard in Tableau or Looker, and eventually you might code up your own web app with visualizations. But you shouldn’t start with the web app, you’ll get it wrong.
  • Dig in when ready — once you’ve done the research, understood the data, prototyped, and really started to get the idea, then you can build something meant to be a product. In the next part of this post when I’ll talk about potentially some different forms of data products, it’ll be clear that the prep work might be very quick, or it could take months or longer. Ultimately, though, you will settle on approaches that will truly help your partners and you should shift from research phase to build phase.
    When my colleague Adria and I worked on housing quality issues in Syracuse, we spent nearly a year understanding the ins and outs of Code enforcement, doing analysis, and observing challenges. Then, one day we were ready to build, and within 24 hours we’d laid out a plan and were ready to go, in full partnership with the Mayor, code enforcement officers, and others. The output included maps and dashboards, new training programs, and more, and the program and product have changed over time. But there was a moment where we were ready to go, and that shifted our focus.
  • Plan for support and incremental improvements and updates — once you are really building and delivering a product, people will expect the thing to work and will also likely expect it to improve. By this point, ideally you have bugs in your data pipelines largely resolved and are monitoring for when things break. You should implement continued training, and processes to understand where you should improve features of the product. For data, that might mean thinking about new data sources, new systems to collect data, new ways to analyze or output findings of an analysis, or new ways of interacting with the data. In any of these cases, the work can’t be an afterthought. Things will break, people will want more. You need staff and time to make this all work.

Data As A Product

So what does the data product actually look like? I have to say this has been one of my biggest challenges to think through because 1. a data product could be a number of things depending on the audience and 2. a data product may not look like a “product” as traditionally defined.

The data — the data itself may be the product. The ingestion, transformation, and delivery of data to a table in a database is a huge undertaking, and just that process requires focus, care, prototyping and iteration, and consistent delivery.
Documentation, data quality tests, thoughts on performance, etc, are all elements of a data product in this case. The client here, is likely not the sales person or CEO. Instead, it is probably other members of the data team or software developers that may make data available through more downstream products (like a dashboard or custom built tool). If there’s a data product manager here, they are prioritizing discussions with analysts and putting improvements to documentation, security, and pipeline efficiency on the roadmap, and they are probably thinking about finding faster ways to provision access to the data to enable self service.
While this product is critical to any business function, it will likely not be seen or understood by the rest of the organization. Thinking about how to sell this product’s value is really important — it is the infrastructure that supports everything else. But like in a house, fixing the foundation is essential, but doesn’t necessarily bring more value to your house, it just enables the house to continue to stand. That value isn’t always recognized without good explanation.
Members of this team likely include:

  • Data Engineers
  • Analytics Engineers

The access layer — Data sitting in tables in a databases are good for analysts, but don’t serve much purpose for others throughout the organization. Providing consistent access to data, even in a relatively raw format can be valuable.
Connecting the database to an Excel spreadsheet or outputting a report to a pdf may be all that the stakeholder needs, especially at first. The research and discovery efforts can be focused around what data someone would want to see, where they would want to see it, and how often they need the data updated. This access layer could be more complex with the introduction of an API to interact with the data.
The data product manager will prioritize a roadmap that includes planning for keeping the architecture for an API up and running, or providing a plan to increase the amount of data available in a spreadsheet or report. The Product Manager might also start to plan for a future state that incorporates more analysis.
Members of this team likely include:

  • Data Engineers
  • Analytics Engineers
  • Software Engineers

The analysis — Most data products will incorporate some kind of analysis into the work. In some cases, the analysis product is the output the business stakeholder will use directly (think of a recommendation to buy or sell based on a number of factors). In other cases, the analysis may be incorporated into one of the products listed above (think of an analysis that creates an industry tag and then the result lands as a column in the dataset product). The distinction between these options will be important as you gauge who the right stakeholder is and how you should be engaging and planning future work.
The Data Product Manager may plan for how that analysis should be presented to the rest of the organization, how it can be improved to incorporate more inputs, how it can be extended, and how it can be more reliable.
Members of this team might include:

  • Data Engineer
  • Analytics Engineer
  • Data Analyst
  • Data Scientist

The dashboard product — Much data work ends up in a dashboard. Sometimes these dashboards can be ill-conceived — the initial production of them can be simple to build so many requests will come in to build one. Building a good dashboard product, though, requires all of the “data as a product” steps listed above, and the dashboard will incorporate a number of other “data products” listed here.
Data teams should be clear about the difference between a one-off dashboard that may just show some interesting data, but has no plans of being maintained, versus a fully built out and managed dashboard that serves an ongoing purpose with a team or organization.
The dashboard product could be built in a tool like Tableau or Looker, but it also might be built in Shiny, Streamlit, or completely from scratch. The Data Product Manager should be planning for dashboard enhancements, speed improvements, and more.
Members of this team likely include:

  • Data Analyst
  • Analytics Engineer

The full-stack product — Organizations that are selling a technical product or that are building internal tools will likely leverage data to present in their product. In this case, they might leverage any of the data products listed above. A product like this might a vacation app where the data incorporated is a dataset product of available properties and a recommendation product that recommends which product will be most attractive to the user. It could be an internal app that measures the performance of staff and leverages an analysis product and feeds into an HR system.
Depending on the focus of the app, data team members should push to be involved in conversations and discovery sessions where the data product is being presented to end users and stakeholders. Though this product will include more than just data, the data team will still need a direct connection to stakeholders and an understanding of opportunities to improve and extend their work. The Data Product Manager in this case will strategize about better ways to extend and present data and information, and likely to incorporate other data products into this larger product. They will work alongside a Product Manager, whose role should be envisioning the strategy for the product overall, including look and feel, and prioritizing roadmap as it relates to building out other new features. In somce cases, there may only be a Data Product Manager as a part of the team, because they are bringing the data products that have been built elsewhere into this product. In other cases, this product will incorporate a large team of data folks.
Members of this team likely include:

  • Software Engineers
  • Data Engineers
  • Data Analysts
  • Data Scientists
  • UI/UX
  • Designers
  • Product Managers

It is an exciting time thinking about how data because more formally incorporated and recognized as a product in itself. There is a lot that goes into this process, and an important thing to call out is that of all the products listed above, none is necessarily more important. Many times, a full-stack product may be seen as the end goal because it is the most complex or the most customizable and will likely bring the largest headcount. That said, its complexity brings a large amount of risk that may not be necessary. Part of treating data as a product means understanding the right ways to use data to help drive decision-making, not building complex things just because.

What has been your experience in treating data as a product?

--

--