Last month at Snowflake Summit, Sam Altman anchored the opening keynote with a fireside chat. AGI timelines were discussed, and he, as you would expect, emphasized just how quickly things are moving (as is our bar for what AGI actually is).
One thing that stood out to me was how context is beginning to become more and more important, even with today’s models, longer context length, memory, and improved retrieval mechanisms will greatly expand the impact that they will have.
Shopify CEO Tobi Lutke was also singing the song of context later in the month, highlighting that “context engineering” was a better framing than prompt engineering. We’re moving beyond trying merely to coax the correct answer from the underlying model and instead ensuring we are providing it with the information with which to provide the correct answer.
This tracks with a bit of what we’ve seen in some of the acquisitions we’ve seen this year. Companies that hold core enterprise data, e.g. Salesforce, ServiceNow, have bought companies that help manage metadata, Informatica, and data.world respectively. The metadata provides context on the underlying data, it helps provide LLMs a guide to where to find things, which data to trust, and so on.
Our previous discussion of semantic layers, inspired in large part by David Jayatillake’s writing on this, is related.
Salesforce and ServiceNow want to give AI a map.
Interestingly, we’re also starting to see another trend, the hoarding of data. Does it benefit Salesforce to make its CRM or Slack chat data easily accessible by 3rd party services? It does not. They announced recently a change to Slack API terms to block bulk data access for LLMs.
Salesforce wants Agentforce to be your go-to for understanding the data they manage for you.
I understand the motivation, but it may hurt them in the long run. Enterprises do not want their data locked away in walled gardens. We’ve seen this with the trend to open table formats recently.
Version one of this challenge was the analytics stack, where companies go to great efforts to integrate disparate data sources into a single central data model in their data warehouse/data lake to power analytics.
Most agree that AI will be a much more significant unlock of the value of that data than internal analytics. The most powerful AI will be those with access to ALL of the enterprise’s data, not just the data in one service provider. I can’t help but wonder if by trying to keep all their data proprietary, these platforms will lose out to new contenders built around open data platforms.
On a related note, one of the questions we’re getting from customers is how they can start to integrate real-time context in AI services for their customers.
If you’re building an AI agent, do you want to reference static docs/faq or do you want to take into account the recent behavior of a shopper on your website or a the error that just affected someone on your software?
We believe that this real-time context will be critical to agentic workflows and have already seen significant impact for customers.
One thing we can be sure of, data and metadata are going to be a critical battleground in an AI, because context is king.
Warmly,
Paul Dudley
—-----
P.S., we previously talked about the shift towards forward-deployed engineers, services is now hot in silicon valley. This also ties back to context, services allow AI to be tailored based on the right underlying data sources to ensure quality output.