Home / Technology / OpenAI's AI Now Manages 600 Petabytes of Data
OpenAI's AI Now Manages 600 Petabytes of Data
4 Mar
Summary
- AI agent processes plain-English queries for data analysis.
- Tool uses AI for over 70% of its code, built in three months.
- Over 4,000 of 5,000 employees use the data agent daily.

OpenAI has implemented an advanced AI data agent that significantly enhances employee access to its extensive data infrastructure. This innovative tool allows over 4,000 of OpenAI's approximately 5,000 employees to query 600 petabytes of data using natural language prompts. The agent, with over 70% of its code generated by AI, dramatically reduces analysis time from hours to minutes.
Developed by OpenAI's data infrastructure team, the agent integrates seamlessly into daily workflows via Slack, web interfaces, and IDEs. It generates charts, dashboards, and analytical reports, enabling teams like finance, product, and engineering to extract sophisticated insights. A key feature is its ability to operate across organizational boundaries, unifying disparate data sources for comprehensive analysis.
The system addresses the challenge of navigating 70,000 datasets by employing AI, specifically Codex, to map and understand data tables. It leverages multiple context layers, including schema metadata and institutional knowledge, to ensure accurate analysis. Prompt engineering is used to mitigate AI overconfidence, emphasizing thorough validation before data processing.
OpenAI emphasizes that while the tool itself is not for sale, the underlying technologies are publicly available, encouraging other enterprises to build similar solutions. This strategic move aligns with OpenAI's broader mission to empower businesses with AI data agents, stressing that robust data governance is crucial for success in this evolving landscape.




