Just Speak to Write SQL! Codex + Lifelong Memory: OpenAI Reduces Query Difficulty to Zero

Synced Report

Editor: peter dong

[Synced Digest] By early 2026, while most enterprises are still relying on data analysts to manually write SQL queries, an internally exposed data analysis agent from OpenAI—capable of autonomous thinking, reasoning, and even self-evolution—will shrink data query times from "days" to "minutes."

Why do data teams always "fall into the same trap"?

The answer often isn't a lack of computing power, but rather too many tables, too many definitions, and too much scattered experience:

Even when called "active users," different tables may have completely different definitions; even if the right table is chosen, it might take over a hundred lines of SQL to get results, and a single wrong join condition ruins everything.

Internally, OpenAI has taken a more radical step: letting a Codex-driven data agent take over the entire chain of "finding tables—understanding tables—writing SQL—validating results." By using a six-layer context architecture, it fills in data semantics, integrates organizational knowledge, and solidifies experiential memory, allowing engineers to replace manual labor with simple questions.

Data Queries No Longer Require Manual Table Lookups

"We have a large number of structurally similar tables, and I spend a huge amount of time trying to figure out the differences between them and which one to use." This daily complaint from an OpenAI engineer echoes the common plight of data workers everywhere.

OpenAI's internal data platform holds 600PB of data, distributed across 70,000 datasets. Imagine this: when an OpenAI engineer needs to analyze ChatGPT user growth, they face dozens of similar user tables, each claiming to record "user activity," yet each with a different definition.

Choosing the wrong table means days of effort go down the drain; worse, it could lead to critical decisions based on erroneous data.

Even more headache-inducing is that even if the correct table is selected, generating accurate results remains challenging.

The image below shows a SQL statement with over 180 lines, resembling an insurmountable mountain—complex table joins and aggregation operations mean any tiny error can invalidate the entire analysis.

Now, with an agent powered by Codex and equipped with autonomous learning capabilities, engineers no longer need to write hundreds of lines of SQL queries. They can simply ask questions to find the information they need in the ocean of data, such as the query shown below comparing active user counts at two different points in time.

The Six-Layer "Data Brain" Architecture

There are many tools that convert natural language into SQL statements, but the core innovation of the data agent used internally at OpenAI lies in its multi-layer context architecture.

The bottom layer, basic metadata, includes fundamental information like table structures and column types, providing the skeleton of the data graph for the agent.

The next layer is manual annotation, consisting of table and column descriptions carefully written by domain experts. These capture intent, semantics, business meaning, and known caveats that cannot be easily inferred from schemas or historical queries. This layer essentially provides basic training for the agent on each table's information.

The subsequent Codex enhancement layer derives code-level definitions of tables, enabling the agent to understand the actual content of the data more deeply. This layer provides critical information such as value uniqueness, table update frequency, and data ranges. Its introduction allows the agent to comprehend differences in how various tables are constructed and updated.

Moving up to the organizational knowledge layer, the agent can access Slack, Google Docs, and Notion to retrieve key company background information, such as product launches, reliability incidents, internal codenames and tools, as well as standard definitions and calculation logic for key metrics.

Armed with background information obtained from external texts, the agent avoids making common-sense errors.

For example, when a user asks, "Why did connector usage drop significantly in December?" the agent doesn't simply report the decline in numbers. Instead, through organizational knowledge, it discovers that this was primarily a measurement/logging issue rather than a real usage crash, relating to data collection changes caused by the release of ChatGPT 5.1.

The fifth and most critical layer, learning and evolution, gives the agent persistent memory. When it receives corrections from users or discovers nuances in data issues, it can save these experiences for future use. Memories can also be manually created and edited by users. They can be globally applicable or unique to a specific user.

The top layer, runtime context, enables the agent to directly check and query tables by querying the data warehouse in real-time when existing context is absent or outdated. It can also communicate with other data platform systems (metadata services, Airflow, Spark) to obtain broader data context.

Dynamic Switching Between Offline Retrieval and Online Querying

So, how do these six layers work together?

Specifically, the process can be divided into two steps: offline and online.

Every day at dawn, the agent systematically scans the actual usage and call traces of thousands of data tables from the previous day, absorbing annotations and insights left by data experts, and invoking Codex to interpret deep logic within the code, deriving richer business semantics behind the tables. All these scattered "knowledge fragments" are fused into a unified, standardized "knowledge graph".

Subsequently, through OpenAI's embedding models, this data is transformed and compressed into groups of vector embeddings and stored in a high-speed retrieval database. Thus, an immediately usable "palace of data memory" for the AI agent is forged.

When a user's question arrives, the agent no longer needs to dive headfirst into the ocean of metadata for time-consuming manual retrieval like a human analyst. Instead, through Retrieval-Augmented Generation (RAG) technology, it precisely locates and extracts the data tables most relevant to the current question. This process is fast, scalable, and has extremely low latency.

For requests requiring the latest data, the agent simultaneously activates a real-time query channel, sending query requests directly to the data warehouse. This achieves both the immediacy of runtime context and deep integration with offline knowledge. Thus, a complex business question is transformed into clear insights available in seconds, facilitated by the synergy of "lightning retrieval" from offline memory and "precision guidance" from real-time data.

A Paradigm Shift from Static Tools to Dynamic Teammates

The most astonishing aspect of this agent is not its technical complexity, but how it integrates into daily workflows to become a true "teammate." Unlike traditional "question-and-answer" tools, the data analysis agent used internally at OpenAI is designed as a "teammate you can reason with". It is conversational and always online, capable of handling both quick answers and iterative exploration.

Imagine a scenario: when a product manager's question is unclear or incomplete, the agent proactively asks clarifying questions. If there is no response, it applies reasonable default values to move forward. For instance, if a user asks about business growth without specifying a date range, it might assume the last seven or thirty days. This allows the agent to keep responding while collaborating with the user to achieve more accurate results.

To prevent the evolving agent from going off track during its learning process, the OpenAI team equipped it with a strict supervisor using the Evals API.

For every important question in the Evals API, there is a manually written query statement serving as a "gold standard." The agent's performance is continuously monitored and scored.

These evaluations check not only for SQL syntax correctness but also compare the accuracy of the result data. When the agent "learns badly," the system immediately issues an alert, ensuring problems are detected and fixed before affecting users.

Regarding data security, the agent stipulates that users can only query tables they already have permission to access. When access permissions are missing, it flags this or falls back to alternative datasets the user is authorized to use.

To ensure transparency in the data analysis process, the agent summarizes assumptions and execution steps alongside each answer to expose its reasoning process. When a query is executed, it links directly to the underlying results, allowing users to inspect the raw data and verify every step of the analysis.

How to Build a Data Analysis Agent

The aforementioned data analysis agent from OpenAI is not open source. However, for those wishing to build a similar agent from scratch, OpenAI engineers have shared the pitfalls they encountered.

Initially, the agent had access to the complete dataset, but this quickly caused it to get lost in functionally overlapping data tables. To reduce ambiguity and improve reliability, developers had to restrict the data tables accessible to the agent, thereby reducing ambiguity and enhancing query reliability.

Another pitfall came from highly standardized system prompts given by developers. Although many questions share similar analytical shapes, the variations in detail are significant enough that rigid instructions can be counterproductive. When the focus shifted to real-world usage effects, leaving the "how-to" to the agent rather than system-level prompts made the agent more robust and produced better results.

The most critical point, however, is realizing that compared to expert annotations on data tables, the true meaning of data exists within the code. Query history more accurately describes a table's shape and usage, capturing assumptions and business intents that never surfaced in SQL or metadata. By using Codex to crawl the codebase, the agent understands how datasets are actually constructed and can better reason about what each table truly contains. Compared to obtaining information solely from the data warehouse, the code that builds the data can more accurately answer "what's in this table" and "when can I use it."

As enterprise data environments become increasingly complex, tools similar to OpenAI's data agent may become standard configuration for future enterprise data analysis, driving the entire industry toward a more efficient and intelligent paradigm of data-driven decision-making.

The goal of these agents is not to replace data analysts, but to augment their capabilities, liberating data analysts from tedious query writing and debugging so they can focus on higher-level tasks like defining metrics, validating hypotheses, and making data-driven decisions.

References:

https://openai.com/index/inside-our-in-house-data-agent/

https://x.com/OpenAIDevs/status/2016943147239329872