The definition of a data product or “Plumbers don’t sit around debating what is plumbing”
💡 Data teams must prove their value
Back in January, Lynn Bender hosted Data Day Texas with an all-star roster of data industry thought leaders speaking on Saturday. A great event.
This year, Lynn added a new follow-up event on Sunday centered around discussion, rather than presentations. Selected moderators submitted topics they wanted to lead discussion around as a round table. It was a great format, and I understand that tickets for next year’s event are going fast (I’m signed up!).
“Plumbers don’t sit around debating what is plumbing.”
Ryan Dolley
Jean-Georges Perrin (JGP) led the session on Data Contracts. JGP started the discussion in what seemed like a very sensible place, an attempt to agree on what the definition of a data product is. This led to 30-40 minutes of debate, during which we went around the houses to discuss the definition.
The essence of the problem is that every software product is a data product. Arguably, all of the work data teams do is to create data products. It’s also not defined by the technology or architecture. Some businesses sell data as a product, e.g Zoominfo, they really only need to assure that you have accurate data and people will pay for it (not to trivialize assuring data quality and regulatory compliance in that world).
Due to the above, a data product will always end up being a somewhat arbitrary designation. As Ryan Dolley so poignantly put it in the quote above, plumbers don’t sit around debating what plumbing is. They build new plumbing systems and fix problems with existing plumbing systems. While people might complain a bit about how much plumbers charge, they are ultimately happy to pay them because it’s clearly important. BTW, I don’t think JGP would agree, and he wrote a great writeup of his conclusions after the session.
The debate about data products comes down to fear about the value that data teams deliver for the business. Part of this is because of the excesses of ZIRP era headcount and tool spending. Companies were spinning up data teams not because they did a careful analysis of how it would help them run their business better, get more revenue or save costs, but because that was what you did when you decided to migrate to the cloud, if you were an established business, or when you got your series A, if you were a growing startup.
Data teams got caught up in this as well, deploying too many toys when a simpler solutions may have sufficed. I don’t think it’s fair to criticize them too much because everyone from employees, to leaders to investors were all caught up in the same fever of irrational exuberance and free money.
The past couple of years have brought back reality for data teams. They need to add value.
I come from a somewhat biased perspective, because of the nature of Streamkap’s business, providing streaming data pipelines, our customers tend to be data teams working in collaboration with software engineering teams to deliver customer facing or operational real-time apps that customers either pay directly or expect as a critical part of the overall product they are paying for.
This would fall under the definition of a data product for many, and while I’m happy for folks to characterize it that way, I’m not sure that’s the important part. In those cases, data teams are helping solve problems for the end customer, and that’s valuable.
In other data teams, they are doing more traditional BI, creating dashboards, and while this may not be as sexy and is less likely to be considered a data product, does that matter if it adds value to the business?
Ultimately, I’m not against framing as data products, but instead of getting bogged down in semantics, data teams should focus on delivering tangible value to their organizations.
This means actively collaborating with other functions, identifying key business challenges, and using data to drive measurable improvements – whether it's increasing customer satisfaction, optimizing operations, or uncovering new revenue opportunities. By focusing on real-world impact, data teams can solidify their position as a core value for the business rather than a cost center constantly looking over their shoulders.
Warmly,
Paul Dudley
P.S. If you're interested in discussing data in a more practical setting (or just grabbing a drink), and if you're in San Francisco, here are some upcoming events you might want to check out:
March 18th – Low-Key Data Meetup 🍻
Tony Fancher and I are hosting our next low-key data happy hour on March 18th. There will be no sponsors, no talks, just a casual meetup for data professionals to connect and discuss whatever’s on their minds.
I'll also be attending the following events and invite you to join me for a chance to connect and chat:
March 19th – ClickHouse Meetup at Cloudflare 🚀
This edition highlights startups building on ClickHouse. It features user presentations, interactive discussions, and networking with experts and users.
March 20th – Tobiko User Conference 🏢
Join Tobiko, the team behind SQLMesh and SQLGlot, for their first-ever summit on next-gen data engineering. This event will feature an exclusive look at the roadmap for both open-source and enterprise solutions, along with insights from partners, investors, and customers on the future of data in 2025.