
DataHub is introducing a new context intelligence layer that mines years of SQL query logs to help AI agents stop making basic mistakes when working with enterprise data.
The company says the approach is designed to solve a problem many teams are already running into: large language model (LLM) agents hitting data warehouses directly and returning wrong answers because they lack the context humans take for granted.
One example comes from Miro’s data team. When they pointed AI agents straight at their Snowflake environment, the agents produced incorrect results more than 65% of the time. The underlying issue wasn’t the model itself, but the fact that the agents were dropped into an environment with more than 10,000 tables and no semantic layer to guide which tables or joins to use for a given business question.
In that kind of sprawl, agents have no reliable way to map a natural-language request to the right data assets. They often guess, hallucinating joins and table combinations that look plausible in SQL but don’t reflect how analysts actually work with the data.
DataHub’s new capability, called Context Intelligence, aims to fix that by treating past analyst behaviour as the ground truth for how data should be used. Instead of relying only on raw schemas and table names, it analyses existing SQL query history to build a semantic index of which tables, joins and patterns have successfully answered real business questions.
That index is then exposed directly to AI agents through multiple popular tooling ecosystems, including MCP, LangChain, Google’s Agent Development Kit and CrewAI. In practice, this means an agent can look up how human analysts have historically queried a given metric or domain, and reuse those patterns rather than inventing joins from scratch.
According to co-founder and CTO Shirshanka Das, the goal is to let enterprises turn “years of analyst query history into a living, retrievable knowledge base where agents stop hallucinating joins because they have access to the joins that have worked before, validated by the people who ran them.”
Context Intelligence is built on the same query-log infrastructure DataHub has already used for lineage tracking in production deployments worldwide. That lineage work focuses on understanding how data flows from operational systems, through streaming infrastructure, into warehouses and on to downstream business tools. The new layer effectively repurposes that foundation to serve LLM-based agents.
DataHub itself began life inside LinkedIn as a metadata management project. It was created to tackle two simultaneous challenges: making data across the organization easier to discover and use, while ensuring that the same data was only used appropriately and for the right purposes.
Das, who led data infrastructure at LinkedIn for nearly 11 years, helped drive that effort before open-sourcing DataHub in early 2020, after nearly six years of internal development. Since then, the open source project has grown significantly, with more than 15,000 contributors and 3,000 production deployments around the world.
Over the years, lineage has been a primary use case for DataHub users: tracking how data moves and transforms across complex stacks, and supporting needs like regulatory compliance audits and operational triage. By layering Context Intelligence on top of this lineage-aware foundation, DataHub is now positioning its platform as a bridge between traditional data cataloguing and the new generation of AI agents that need reliable, enterprise-specific context to operate safely.
The company’s bet is that query history provides a far richer, more practical signal for agent routing than schemas alone. Where schemas describe what data exists, query logs capture how experts actually use that data—information that can be turned into a guidebook for AI systems navigating large warehouse environments.
For organizations experimenting with LLM agents on top of platforms like Snowflake, the message is clear: without a way to encode and surface hard-won institutional knowledge about tables, joins and trusted patterns, even sophisticated models can fail badly. DataHub’s Context Intelligence is an attempt to close that gap by elevating SQL query history into a first-class source of truth for AI-driven analytics.
Discover more from TechBooky
Subscribe to get the latest posts sent to your email.







