In June of 2014, Aswath Damodaran wrote a piece in FiveThirtyEight questioning Uber’s valuation. A then astronomical $17 billion on a $1.2b funding round. In his piece, Damodaran estimated Uber’s terminal value to be no more than $6B, based on the total addressable market (TAM) of taxies globally being roughly $100B.
Cut to February 2025, Uber is profitable on a $48B annual revenue run rate (Q4 24) and has a $167B market cap.
So what happened?
Uber didn’t merely take a small share of the taxi market, they created a huge new market that didn’t previously exist. My personal use is the perfect example, I lived in the Outer Sunset in San Francisco, a notoriously hard place to call a taxi to, so I didn’t use taxis. With Uber, I could confidently order a ride instead of driving or riding the muni. My Uber spent was many multiples of my taxi spend.
What does this have to do with streaming data? For the past couple of decades, data teams have learned to push back on business requests for streaming and real-time data. Why?
It was much more expensive and much more complex. You really had to justify the effort and spend with clear business outcomes and if you coudn’t, you would move back to batch.
What happens when that is no longer the case?
Let’s examine the Snowflake ecosystem more closely over the past few years for an example of how things are changing.
Back in 2017, Snowflake launched its first support for streaming ingestion with Snowpipe. This enabled users to stream data into an object storage like S3/GCS/ADLS and provided a mechanism to pick these files up and ingest them into Snowflake with a background process that did not require a warehouse. This meant you got a relatively low cost for streaming ingestion. It wasn’t perfect though, it required the intermediary step of creating a file in object storage, didn’t handle large files very well, and ultimately is only near-real-time at around 1 min latency to load data.
In 2022, a new functionality entered private preview, Snowpipe Streaming (could have used a better-differentiated name but oh well). Snowpipe Streaming is built around a Java SDK that allows for direct streaming into Snowflake with no intermediary step in object storage.
This means a latency of a few seconds for data to arrive in Snowflake. Snowpipe Streaming is also a serverless process and is yet again cheaper than Snowpipe ( up to 10x depending on throughput requirements).
This now means that if you’re on Snowflake, streaming will be both faster and cheaper than running batch or microbatch processes. Why wouldn’t you use streaming?
Databricks and BigQuery also offer low-cost streaming ingestion for their customers.
Where does Streamkap fit in? We are helping to bridge the middle bit of the streaming ingestion, making adopting streaming as easy as it is with batch ETL tools like Fivetran. We pick up data from streaming sources like database change logs and event streams and can stream into these destinations.
All of this means that data teams are now adopting streaming because of the lower cost, and then, once they have real-time data readily available, they can decide on what frequency they want/need to use it.
This is the first step in the broader move to the streaming first kappa architecture we described in a previous post.
We’ve seen companies like SpotOn, Niche, and Fleetio adopt streaming to power better and faster experiences for their customers while also reducing their costs.
Where do we go from here? More data processing in the stream. Over the past 10 years, the data analytics market has gone from being dominated by ETL (extract, transform, load) to ELT (extract, load, transform) architectures. This was driven by pain around traditional batch ETL ( slow, brittle, opaque, and expensive).
ELT, as an approach and set of tools, brought all of the raw data into the data warehouse, where it could then be modelled and shaped into the desired format for downstream consumption. Some of the advantages of ELT, e.g., transparency and versioning, were not inherent to ELT as an approach and were simply improvements on the tools and processes.
We see an opportunity for streaming transformations (watch this space for news from Streamkap on this front) to combine the best of ELT and ETL for lower latency, lower cost, and better control over data pipelines particularly in the era of shift left architectures and increasing Iceberg adoption.
Another key driver is the rapidly increasing demand for data to power AI. AI agents demand real-time data to power their view of the world.
So, just as Uber showed us that the market for taxi was not indicative of the potential for ride sharing. As streaming becomes as easy and cheap as batch processing, it will become the dominant architecture.
Warmly,
Paul Dudley
P.S. I hope the reader appreciates that I didn’t bring Jevon’s Paradox into the discussion. I think we’ve all had enough of that in the aftermath of DeepSeek.