Codex Lead Reveals OpenAI Internal Development: Reinvented Weekly! Codex Has Evolved into a Teammate, Can Run Overnight and Self-Test! Advice for Newcomers: Fundamentals Never Go Out of Style; Windows Version Coming Soon

Edited by Yun Zhao

"At some point in the future, perhaps we will build software for Agents. At that time, Agents might become product managers or product engineers."

Yesterday, OpenAI Codex Engineering Lead Tibo Sottiaux and OpenAI Applied Chief Technology Officer Vijaye Raji were guests at Pragmatic Summit, sharing the genuine experiences and feelings of OpenAI's internal engineers with the outside world.

Looking back at 2025, Tibo said the degree of change could be described as "shocking."

Even just looking at the last six months, we have gone from 'treating Codex as a tool' to 'treating it as an extension,' then to 'treating it as an Agent,' and have now evolved to 'treating it as a teammate.'
Some engineers even consume hundreds of billions of tokens per week to run multiple Agents.

As Codex's capabilities grow stronger, the bottlenecks in software development are even changing on a weekly basis:

Previously, the bottleneck was code generation, then it became code review, and now it's more about: How can we understand user needs faster? How do we handle issues? How do we track feedback on important platforms like https://twitter.com/ and https://reddit.com/, and synthesize this information into product strategy?

Furthermore, Tibo also revealed that some engineers inside OpenAI consume token counts reaching hundreds of billions per week. And these are not single Agents.

Internally, there are also more "sci-fi" development tools compared to the outside world: Codex Box.

Last week we internally released Codex Box, which can reserve a development environment on a server and directly send prompts to make it work. You orchestrate workflows on your notebook, and it executes tasks in the cloud. Many people close their laptops to go to meetings, and when they return, the work is already done.

For example, when the Codex team holds meetings to discuss Codex, they directly initiate Codex threads within the conference room to diagnose problems and conduct post-mortem analysis.

When mentioning the future, regarding how the software development industry will change, Tibo and Raji provided several directions.

First, development speed might increase by another order of magnitude, which will bring about a new round of changes.

Second, OpenAI will truly achieve large-scale multi-agent collaboration networks, allowing them to work together around very grand objectives.

Third, next, guardrails will be built for the systems constructed above. Developers will no longer need to review code line by line but will verify its correctness through some means, or ensure its safety through constraints. The code itself will be abstracted away, with the real focus shifting to the problem itself and the properties the system should possess.

Fourth, Raji mentioned, perhaps even this year, developers will have a "personal representative assistant" dedicated to helping them check the status of one to two hundred small Agents. It can aggregate and represent all the AI agents working efficiently for you in the background. You won't have to monitor or check each one individually. (ps: somewhat like the 'Entire!' project reported yesterday by the editor, done by the former GitHub CEO!)

Additionally, the host also revealed that OpenAI tends to recruit product-oriented engineers. Raji explained that product intuition is still important because, in essence, products are still built for people.

For newcomers who want to work in the software industry in the AI era, both Tibo and Raji stated: Fundamental skills never go out of style! OpenAI doesn't blindly rely entirely on Codex with its eyes closed.

As long as you have solid fundamentals, product intuition, know what you are building, and can navigate up and down the tech stack to solve problems, these abilities are key. And this will never go out of style.

"We can sit here because we have solid fundamentals. But the role of a software engineer has indeed changed significantly!"

Finally, just to add, a few hours ago, Tibo also announced major news: Codex has begun invite testing for a Windows version.

Undoubtedly, this means OpenAI is continuing to ramp up efforts in the enterprise developer space.

Below is the "programmer evolution story" as told by the Codex team, curated for our readers.

What's happening in OpenAI's internal development:

Codex has evolved into a teammate

Some engineers consume hundreds of billions of tokens per week

Host: There's a question many people are asking: What exactly is happening inside OpenAI right now? More specifically, from a software development perspective, how exactly have engineers' working methods changed?

Raji: You ask a good question; indeed, a lot has changed. I've been at OpenAI for about six months, and one of my deepest impressions is that the research capabilities inside the company are incredibly strong. You just need to project those possibilities a bit into the future to feel shocked.

First, let's talk about software development methods. Codex has completely changed the way we write code. The change has been very dramatic. Even looking just at the last six months, we have gone from 'treating Codex as a tool' to 'treating it as an extension,' then to 'treating it as an Agent,' and have now evolved to 'treating it as a teammate.' I even think engineers will soon give their Agents names, treating them as real partners. This change is happening very quickly.

I've seen internal leaderboards; some engineers consume token counts reaching hundreds of billions per week. And these are not single Agents. Last week we internally released Codex Box, which can reserve a development environment on a server and directly send prompts to make it work. You orchestrate workflows on your notebook, and it executes tasks in the cloud. Many people close their laptops to go to meetings, and when they return, the work is already done.

This is the software development method inside OpenAI right now. It has fundamentally changed. I believe within a few months, the heart of Silicon Valley will adopt this first, then it will spread. In the future, everyone will develop software this way.

Host: If I went back six months or even a year and heard you say this, I might have thought you were telling a fairy tale. But now it's different; many people are using it. I'm using it myself. I've also talked to engineers at OpenAI. I like talking to engineers because they have almost no "media training" and speak very directly.

What reassures me a bit is that not all engineers rely 100% on Codex to write code. People use it a lot, but at different levels. There is one team that is indeed at the forefront: the Codex team.

Tibo, you lead the Codex team. Can you tell us how you work every day now? What does a typical engineer's workflow look like?

Codex Team's Working Methods Change Rapidly Every Week

Tibo: Things are changing very quickly. The Codex team has a very interesting characteristic: we almost reinvent our own working methods every week.

We constantly identify bottlenecks, and the bottlenecks keep shifting. Previously, the bottleneck was code generation, then it became code review, and now it's more about: How can we understand user needs faster? How do we handle issues? How do we track feedback on important platforms like https://twitter.com/ and https://reddit.com/, and synthesize this information into product strategy?

Everyone is trying to maximize the use of Agents to do these things.

A few days ago, there was an interesting scenario: someone wanted to join the Codex team, and they asked me: "When working on products at OpenAI, how much compute power can I get allocated?"

This question was very fresh. We do have a lot of compute power, but I've never thought about "compute quota per employee." Usually, compute power is reserved more for researchers training large models.

Now people realize that you can use compute power to amplify your abilities many times over. If you have good taste, good ideas, and understand software development, this era is truly exciting. The things you can do are astonishing.

Agents Will Become Product Engineers

Host: From a more macro perspective, OpenAI has always hired "product-oriented engineers." How has their work changed now? Could their roles converge more and more?

Raji: Essentially, we are still building products for humans. Product intuition is still very important. I've recently been using the new desktop app version of Codex, which makes writing code easier. But product development still starts with "what are we building?" and then iterates and optimizes.

As long as we are still building software for humans, this won't change.

Of course, at some point in the future, perhaps we will build software for Agents. At that time, Agents might become product managers or product engineers.

But the current pace is faster and more interesting. Building software has become more enjoyable because the feedback cycle has shortened dramatically.

I once wrote code on a plane. I couldn't use a remote dev box at the time. The flight attendant asked to close the laptop, and I was reluctant because I didn't want the Agent to stop, so I kept the laptop half-closed. (laughs) Now many people run tasks with their laptops half-closed.

Honestly, developing software is more fun now than before. You can quickly see results, test, validate, and then go back to Codex for adjustments.

New Engineering Practices:

Parallel Exploration of Solutions, Designers Writing Code; Code Review Becomes a Bottleneck

Host: In terms of engineering practices, are there any new, strange but reasonable changes?

Tibo: In the past, when faced with complex technical trade-offs, we would write design documents, discuss various options, and then choose one.

Now the interesting part is that people explore multiple implementation options in parallel, and then select the optimal solution through experimental data.

Another change is the blurring of role boundaries. Designers now write code that might be more than what engineers wrote six months ago. This is because the model is good enough that the generated code quality can be merged directly.

Host: Any other observations?

Raji: Yes. For example, command-line tools. For tools like ffmpeg, almost no one can remember the complete commands. Now with Codex, you just say "I want to do this," and it generates the command and executes it for you.

We have expanded from simply "writing code" to "code review," "security review."

When coding efficiency increases fivefold, what happens? The amount of code explodes, and code review becomes the bottleneck. After that, integration and deployment (CI/CD) will become the new bottleneck.

The bottlenecks keep migrating. This revolution is far from over.

Tibo: So we must constantly solve the next set of problems. This is actually very exciting.

Host: Tibo, we previously talked about a practice I had never heard of before—"overnight runs" and "self-testing." Can you talk about that? It sounds very new.

Codex Overnight Runs and Self-Testing

Tibo: It's easy to misunderstand Codex as a "super auto-complete," thinking it just helps you implement a small feature, done in 10 minutes.

But what we see is that as long as you give the model a large enough task, its capabilities go far beyond that. It can run continuously for several hours.

We built a complete environment and capabilities for Codex, allowing it to test itself completely autonomously. We run it overnight, letting it cycle through QA, automatically detecting regression issues.

Another interesting thing. I often chat with the researcher in our team who is responsible for training models. He said that every time he thinks he is better than Codex, he ends up finding out that his prompt wasn't written well, or the environment wasn't configured correctly.

This is both exciting and a bit disheartening. (laughs)

Now it can even independently complete a full model training run, and finally write a PDF report with its own insights. We then pick out the most promising directions from it, continue iterating, and then throw it back to Codex.

These ultra-long-running tasks, and the model's ability to independently complete complex work, are really stunning to watch.

Two Scenarios of "Summoning" Codex in Meetings:

Post-Mortem and Incident Handling

Host: Another very sci-fi scenario. You said that when the Codex team holds meetings to discuss Codex, they directly initiate Codex threads within the conference room to diagnose problems. This sounds like self-looping. Can you talk about that?

Tibo: We have two typical scenarios.

The first is the weekly analysis and post-mortem meeting. We look at feature adoption rates, retention rates, conversion funnels. At the start of the meeting, people always have some questions that aren't visible in the dashboards.

The data analyst will say: "Okay, we'll start a Codex thread in the background, and have an answer in 20 minutes."

In the last 10 minutes of the meeting, we can discuss these new results. One meeting might run five or six questions. It feels like there are a group of invisible advisors working for us in the background.

The second scenario is online incident handling. Codex helps us analyze the cause of the problem and find the fastest recovery path. The speed of information collection and problem solving is greatly increased.

Regarding New Graduates and Junior Engineers

Host: A recurring question in the industry is: What about new graduates? What about junior engineers? I heard from OpenAI's engineering lead that you are hiring a lot of early-career engineers. What's the situation like?

Raji: We are indeed hiring many new graduates. This year there is also a fairly large internship program.

I believe the new generation of software engineers will be "AI-native." They will naturally be familiar with these tools and able to use AI from day one. Giving them such an environment is crucial.

This summer we will welcome our first large batch of new graduates, around 100 people. I'm really looking forward to seeing their performance. The internship program will also continue to expand.

This is a very interesting era.

How Does the Codex Team Onboard Newcomers?

Flat Management, Codex is the First Mentor

Host: Tibo, your team itself is several steps ahead of other teams in the company. When newcomers join, how do they get up to speed quickly?

Tibo: My team structure is very flat. I have 33 direct reports. I don't want to become a bottleneck.

If one person needs to be involved in every decision, this structure won't work at the current speed.

After a newcomer joins, the first "mentor" is actually Codex itself. You ask it questions directly, use it to browse the codebase, understand the project, receive daily reports.

And those truly responsible for onboarding and culture building are often people who have recently joined.

Speaking of new graduates, I hired a very outstanding newcomer six months ago. His performance has been extremely outstanding. At first, I was a bit surprised. But then I realized he has unlimited energy and extremely fast learning ability.

Honestly, my brain might already be on the decline, while his brain is at its peak. (laughs) The success he has achieved in the team makes me very happy.

Advice for Newcomers: Fundamentals Never Go Out of Style

Host: Looking from a "contrarian" perspective, in the past we saw many new graduates grow into excellent engineers because they laid a solid foundation.

Now, if newcomers heavily rely on AI from the start, skipping the 10 to 20 years of training we had in the past, might they lack fundamentals?

Tibo: Fundamentals are still very important.

We attach great importance to the overall architecture design of the codebase, and also to code reviews. We don't blindly rely entirely on Codex with our eyes closed. There are top engineers reviewing things.

As long as the code structure is designed well, guardrails are set up, newcomers become extremely efficient. The key lies in what kind of environment you build, and thinking ahead about how the codebase will evolve in the future.

The Role of Software Engineer Has Changed

Host: If a newcomer now asks: "Raji, what exactly do I do every day?" How does the daily life of a software engineer compare to six to eight months ago?

Raji: Fundamentals never go out of style. We can sit here because we have solid fundamentals. But the role of a software engineer has indeed changed significantly.

I might be showing my age, having been in this industry for 25 years, I've seen too many paradigm shifts. I used to work on developer tools at Microsoft, wrote the editor and language services for Visual Studio. The first time I saw IntelliSense, that feeling was really cool—you type a dot, and all the options automatically pop up.

Host: Back then, when I first entered the industry, developers around me were still saying: "Using IntelliSense doesn't make you a real programmer."

Raji: Yes. (laughs) Going further back, some might have thought that if you don't write assembly, you're not a good engineer. Then it was C++, and later the abstraction layer got higher and higher. Remember when everyone complained about JavaScript?

These debates actually don't matter. As long as you have solid fundamentals, product intuition, know what you are building, and can navigate up and down the tech stack to solve problems, these abilities are key. And this will never go out of style.

Product Managers, Designers: All Writing Code

Directly Pushing Designs to Validated Prototypes

Host: We've mainly been talking about engineers. What about product managers and designers? When both engineers and them can build features faster, how will their roles change? Will they become more and more similar?

Raji: As long as we are still building products for humans, we will definitely need human designers and product managers. Product sense and design sense have no simple substitutes.

Of course, they are also evolving, becoming more and more efficient. Product managers are writing code, designers are writing code. They push designs directly to the prototype stage, validate them, and then hand them over to engineers.

Product managers are also using Codex to make PowerPoints, write Excel add-ins. Efficiency improvements happen at the entire organizational level, not just for engineers.

Internal Knowledge Sharing and "Show & Tell"

Host: You do a lot of knowledge sharing internally, like show and tell. How did you think of that? What's the mechanism? Any interesting cases?

Tibo: We are actually discovering technology while evolving with it.

Like the outside world, we are also exploring: What exactly can AI do for an organization? What does it mean for projects? As soon as a direction seems effective, we release it to the world as quickly as possible.

So, the time window where we truly "see a bit more of the future" is actually very short.

In such an environment, good ideas must spread quickly within the organization. We have very active Slack channels, like the Codex channel, hot tips channel. We also regularly hold hackathons and show and tell sessions.

This is a highly creative phase; there is no so-called single correct usage; everything is being explored.

Our Codex team has a very outstanding product manager, Alexander Emberos. He is the only product manager for the entire team, yet he has "super-amplified" himself.

A few days ago he organized a bug bash; within an hour, everyone experienced the upcoming features. He had Codex collect feedback, generate Notion documents, then have Codex create bugs and feature improvement tasks in Linear, assign them to relevant people, and automatically track progress.

He used AI to turn himself into a project manager with 10x or even 50x efficiency.

But the key is, you can't let the product manager become the new bottleneck. The organizational structure must also adjust accordingly.

Raji: Let me add one point. At recent demo days and hackathons, I've noticed a trend: the depth of the projects being showcased is getting higher.

In the past, it might just be showing "what this capability can do." Now many demos have already handled a lot of edge cases and are almost products ready for direct use. The overall depth continues to increase.

Token Cost Issue

Host: There's a reality that must be addressed: Inside OpenAI, everyone has unlimited tokens. In the outside world, cost is still an issue. Subscription quotas run out and you have to pay extra. Many teams are constrained by budgets.

If others are constrained by costs, any advice?

Raji: Cost is something we continuously think about. We want the model to be stronger, and also want to provide it to users.

But the way of thinking also needs to shift. What you now have is a teammate that works 24/7. You can assign Linear or Jira tasks to it, and should expect it to complete them.

The question is no longer "how many tokens were used," but "how much are you willing to pay for this teammate?"

If each engineer has four or five such "teammates," measuring from a productivity perspective becomes more reasonable.

Of course, we must make these Agents strong enough to truly deserve the title of "teammate."

Tibo: You can also look at it from the company's overall cost structure. For example, market research, analyzing feature backlogs, screening which tasks are easy to implement—in the past it might have required 15 engineers working together, now it's almost free.

Not every company can give employees unlimited inference quotas. But restricting too strictly too early is also a risk.

We are still in a very early stage; many people haven't truly learned how to amplify themselves yet.

My suggestion is: prioritize sufficient inference quotas for the best people in the company. Let them explore fully.

The Speed of Change

Host: Change is really fast. Looking back over the past 25 years, have there been similar moments?

Raji: I've never seen change like this.

I've lived through the dot-com bubble burst, Y2K, the mobile revolution, and participated in the social network wave. But this time it's completely different.

This wave of change is happening on a massive scale and at an extremely fast speed. So fast that some charts already seem hard to explain. So I really think this is a very special, very unique period. Being able to live in such an era is cool in itself.

Next, continuing to abstract above "Agents," engineers won't need to focus on code, only on inputs and outputs.

Host: As a final question, although change is very fast, both of you have worked at OpenAI for quite some time. I'd like to ask you to make an honest prediction: What will software engineering look like two years from now? What will engineering management become like? Based on everything you know now.

Raji: Obviously, a two-year timeframe is far too long.

I think even what happens six months from now is already hard to say. However, there are a few things I'm quite confident about. First, our speed might increase by another order of magnitude, which will bring a new round of changes. Second, we will truly achieve large-scale multi-agent collaboration networks, allowing them to work together around very grand objectives.

For example, based on the capabilities demonstrated by Cursor, one can easily imagine a scenario: you say "rebuild a browser from scratch," and 24 hours later, you have a finished product. That could be a system consisting of two million lines of code, so large that humans can hardly fully understand its internal details.

I think what we'll do next is build "guardrails" for the systems constructed. So you won't have to review code line by line, but verify its correctness through some means, or ensure its safety through constraints. You only need to focus on inputs and outputs. The code itself will be abstracted away, with the real focus shifting to the problem itself and the properties the system should possess.

The history of software is essentially a history of ever-increasing levels of abstraction. Abstraction allows us to build larger products with less code. Over the years, the abstraction level has continued to rise, and now we are at a stage where abstraction is accelerating and leaping.

But I also have a concern. Any sufficiently complex or sophisticated system is harder to debug. We often can only locate problems through symptoms. I think a few years from now, software will be more complex than ever, layer upon layer. We will become very good at identifying problems through "symptoms," and our tools will also become very good at doing this. I think this will become a unique ability that software developers need to master.

Tibo: Raji spoke very well. I want to add one point about what the future might look like.

I think soon, you will only need to converse with your assistant to check work progress. You will have a dedicated "personal representative assistant" that can aggregate and represent all the AI agents working efficiently for you in the background. You won't have to monitor or check the status of a hundred or even two hundred small Agents individually.

I think this form will appear soon, perhaps even this year.

Host: Thank you very much, Raji and Tibo, for revealing what is happening internally and how your team works. It feels like you are always a few weeks, months, or even longer ahead. But this is indeed happening. Also, thank you for your outlook on this exciting era. Thank you very much.

Raji/ Tibo: Thank you.

Reference Links:

https://www.youtube.com/watch?v=Bo6Gtq3nMXc