OpenAI Frontline Development Reality Check: Engineers Who Can Monitor 10-20 Agents and Run Hour-Long Tasks Are Leaving Others Far Behind

Many people are still arguing about whether "AI will replace programmers." OpenAI's internal answer is: AI is re-stratifying engineers. The gap won't slowly widen; it will be amplified by tools, by processes, and by organizations, eventually becoming a kind of "compound interest" that is very difficult to recover from.

At OpenAI, 95% of engineers use Codex every day. Pull Requests (PRs) go through AI's eyes first, then humans; code reviews have been compressed from 10-15 minutes each to 2-3 minutes; those who truly embrace the tools submit 70% more PRs than their colleagues, and the gap continues to widen. The role of the engineer is also morphing: increasingly like a "Tech Lead + Dispatcher," simultaneously monitoring 10-20 parallel Codex threads, with the main work becoming guidance, acceptance, and safety nets, while writing code by hand has become an occasional activity.

Sherwin Wu is the Head of API and Developer Platform Engineering at OpenAI. Since almost all AI startups are integrating OpenAI's API, Sherwin has an extremely unique and broad perspective on what is happening in the entire ecosystem and where it is heading.

In the podcast, he also made a judgment: many of the AI "scaffolding" tools that companies are proud of today—vector databases, Agent frameworks, complex process orchestration—may just be crutches for a transition period. Model evolution will swallow them up. Teams that truly succeed have already changed their tactics: designing workflows in advance for the capabilities models will soon reach. A product that is only 80% functional now can go live; when the next generation of models upgrades, it will directly cross that threshold.

AI will not lift everyone equally. It will push engineers with high initiative to a "disproportionate" height: those who can break down requirements, control context, schedule multiple Agents, and solidify the verification loop can replace a small team from the past on their own. What follows is not just the so-called "one-person unicorn," but rather a forced rewriting of organizational structure: smaller teams, faster iteration, and steeper differentiation.

Beyond engineering, Sherwin believes the more underestimated opportunity lies in business process automation: most work in the real world runs on repeatable, highly constrained, standard operating procedures. When AI truly dives into these processes, it will change the way enterprises operate themselves, not just efficiency.

If you feel that the changes in the last two or three years have been fast enough to cause anxiety, you are not wrong. Sherwin's words are more like a reminder: this is actually a window period that will not last too long. Changes will eventually slow down, but if you miss this segment, many people may not even have time to learn this set of "new stratification rules."

We have translated this podcast episode.

Engineering Stratification in the Agent Era Has Already Appeared at OpenAI

Host: Sherwin, thank you very much for coming on the show. I want to start with a question that has almost become a "barometer" for AI progress, especially in the engineering field. Do you still write code yourself? If so, how much of the code for you and your team is now written by AI?

Sherwin Wu: I still write code occasionally. But honestly, for a manager like me, it's actually easier to use AI tools now than to write code by hand.

For me personally, and for some engineering managers at OpenAI, our code is basically all written by Codex. From a macro perspective, there is a very strong, very real sense of energy internally—everyone is marveling at how far these tools have come and how useful Codex has become for us.

It's actually hard for us to measure precisely "how much code is written by AI" because the vast majority of code—I would say close to 100%—is almost always generated by AI first.

What we really track is that the vast majority of engineers now use Codex every day.

95% of engineers use Codex daily, and 100% of PRs are reviewed by Codex every day. This means that any code that is eventually merged and enters production gets a "look" from Codex, which offers improvement suggestions and points out potential issues at the PR stage.

But what excites me more than these numbers is the overall atmosphere and energy.

We also have an interesting observation: engineers who use Codex more open significantly more PRs. They submit 70% more PRs than engineers who don't use Codex often, and this gap is widening.

I feel that those who submit many PRs are continuously learning how to use this tool more efficiently, and this 70% gap continues to widen over time. Maybe this number is already higher than the last time I saw it.

Host: Let me confirm if I understand correctly: you mean that at OpenAI, for that 95% of engineers, their code is basically written by AI first, and then they review it?

Sherwin Wu: Yes, yes, that's right.

Host: It sounds crazy, but it doesn't seem so crazy anymore; we are adapting to this state rapidly. Of course, I think it still takes a little time to adapt.

Sherwin Wu: Yes, we are still adapting. There are also some engineers who have relatively low trust in Codex. But almost every day, I hear someone being shocked by what it has done, and their trust threshold for "how much the model can do independently" is raised again and again.

Kevin Weil (our VP of Science) has a quote I really like. He often says: "This is the worst the model will ever be." This applies equally to software engineering: as time goes on, people will become more willing to hand over critical work to the model, and the model itself will only become stronger.

Host: Kevin Weil has been on this show before; he said this in the episode, and more than once. Recently, Peter, the developer of OpenClaw (formerly Claudebot / Moltbot), also shared that he uses Codex extensively in his work. He said that many times, after Codex finishes something, he trusts it almost completely, even feeling it could be merged directly into the master branch, and the results would be good.

Sherwin Wu: Yes, he is indeed a very excellent user of Codex. I know he maintains very close communication with our team and has given a lot of good feedback, so I'm not surprised he uses it this way.

Host: Back to this crazy moment we are in, especially for engineers. We have gone from "you have to write every line of code yourself" to "AI writes all your code." I really can't think of any other profession that has undergone such drastic and completely unexpected changes in just a few years. An engineer's entire "job content" over their career lifecycle has been completely reshaped in these past two years. So how do you imagine what the role of a software engineer will look like in the next year or two? What will this "work itself" be?

Sherwin Wu: Honestly, seeing all this is really cool. Part of this excitement comes from the fact that this profession is likely to undergo another very significant change in the next one to two years.

But we are still in the exploration stage. For many engineers, this is a very rare window period—in the next 12 to 24 months, we can almost define the standards ourselves, define "what an engineer should be."

Currently, a trend often mentioned is: IC engineers are becoming technical leads, basically like managers. They are managing entire "fleets" of agents.

Many engineers on my team are actually pulling 10 to 20 parallel threads at the same time. Of course, it's not running 10 to 20 Codex tasks simultaneously, but they are indeed handling a large amount of parallel work: constantly checking progress, adjusting direction, giving feedback to agents and Codex. Their work has shifted from "writing code" to almost "managing."

If I were to give an intuitive metaphor for the next one to two years, I often think of a programming textbook I read in college—"Structure and Interpretation of Computer Programs" (SICP). This book was very popular at MIT back then, serving as an introductory programming course textbook for a long time, and it has a bit of a "cult classic" status in the programmer circle. It uses Scheme to teach you programming, leading you into the world of functional programming, and reading it is very mind-opening.

But what really stuck with me was its opening metaphor for "programming": describing software engineering as a kind of sorcery. The book says software engineers are like wizards, and programming languages are like spells—you chant the spell, and the spell is released to do things for you. The difficulty lies not in whether you can chant, but in: what kind of spell do you need to chant for the program to run the way you want? SICP was written in 1980, and this metaphor has remained valid. I even feel it is being truly "redeemed" by today's reality.

From this perspective, whether it's vibe coding or future software engineering, it seems like a natural extension of this evolutionary path. Programming languages were always spells, except the spells are constantly evolving, making it easier and easier for us to make computers do what we want. And this wave of AI is likely the next stop. It pushes the concept of "spells" to the extreme: you can almost directly tell Codex or Cursor what you want, and it will do it for you.

I especially like the "wizard" metaphor because the current state is increasingly like "The Sorcerer's Apprentice" in Fantasia. You put on the magic hat and start casting spells; the power is ridiculously strong, but the premise is: you must be clear about what you are doing. In "The Sorcerer's Apprentice," Mickey Mouse lets the broom do the work, turns around and sleeps, but the broom works more and more, the flood gets out of control, and the house is directly flooded—this is almost the limit form of vibe coding: wishes come true too fast, and loss of control comes too fast.

So, when I see engineers running 20 Codex threads simultaneously, what I think of is not "cool," but the skills, seniority, and a lot of judgment required behind this. You can't completely let go, nor can you pretend everything will automatically get better.

But its leverage is indeed astonishingly high. A senior engineer who really gets these tools working smoothly can now complete a workload that was impossible in the past. This is also where it's fascinating: we really start to have a very concrete feeling—like a wizard casting spells, while the software runs errands and does work for us. That sense of "magic" is closer to reality than ever before.

Host: I have two clues I want to follow up on here. One of them is that recently I've been hearing more and more feedback: when agents don't work as expected, people feel strong pressure. You send out a bunch of Codex agents at once, and then you have to watch them constantly—"this one stopped running," "that one is stuck," feeling like time is being wasted. Do you have this feeling? Do you see this situation in your team?

Sherwin: Yes, yes, this is happening all the time. Honestly, I think this is actually the most interesting and critical part right now. Because models aren't perfect, and these tools aren't perfect, we are actually still exploring: how exactly should we collaborate with Codex and these AI agents to get things truly done? Such questions often arise internally.

We have a particularly interesting team doing an experiment internally at OpenAI: they maintain a codebase that is 100% written by Codex. Generally, you might let AI write a version of the code first, and then rewrite part of it yourself, check it, and patch it up; but this team is completely "Codex-ified," almost thoroughly leaning in.

Editor's Note: The experiment mentioned by Sherwin Wu has been published by OpenAI as a blog post: https://openai.com/index/harness-engineering/. The article records a software engineering experiment with "0 lines of human-written code": the team started from an empty Git repository and, in 5 months, created a real, usable internal product—capable of deployment, prone to faults but also repaired, and used by hundreds of internal users (including heavy daily users). Yet from start to finish, no human directly wrote any code: application logic, tests, CI configuration, documentation, observability, and internal tools were all generated by Codex (Codex CLI + GPT-5). Ultimately, driven by only 3 engineers, they merged about 1500 PRs and produced close to 1 million lines of code; they estimated the overall delivery speed to be about 1/10 of traditional manual writing.

So they encounter the kind of problem you just mentioned: for example, I want to build a certain feature, but I can't get the agent to do it right. Usually, humans have an "escape hatch" here—you can say "forget it, I'll roll up my sleeves myself," not use Codex, and use tools like tab complete or Cursor to write directly by hand. But this experimental team does not have this exit; this is part of the experimental design.

So the question becomes: What exactly do I need to do to make this agent do the job well? One phenomenon we repeatedly see is—I don't know if you have similar feelings, but it's very obvious on our side—often, when coding agents don't do well, it's not that "it can't," but that the context is problematic. Either the information you gave isn't clear enough, or the agent simply can't get the information needed to complete the task.

Once you realize this, the solution changes: you are no longer "tuning prompts," but start supplementing documentation, supplementing structure, and finding ways to bypass this limitation. In short, it's about encoding your brain's "tacit experience," "team consensus," and "default practices" into the codebase—maybe code comments, maybe code structure, or some Markdown documents, skills files, or other auxiliary resources in the repository. There is only one goal: to let the model read everything it needs to complete the task right within the repository.

This team has many other takeaways, which I think are all worth expanding on. But at least one thing is already clear: deliberately removing the "retreat of not using AI" forces them to see clearly—if we really want to fully embrace agents, these are problems that must be solved sooner or later.

Reducing Engineer Attention to PRs from 100% to 30%

Host: We just talked about how people using AI are frantically submitting PRs, and the number of PRs has obviously increased. Clearly, code review will now become a bigger challenge. Has your team figured out any ways to make code review faster and more scalable, rather than turning everyone into "coolies sitting there reviewing PRs all day"?

Sherwin Wu: Yes. First of all, now Codex reviews 100% of our PRs.

I think something very interesting is happening here: the things we first handed over to the model to do are often those most annoying, most boring parts of software engineering. And precisely because of this, writing software is now more interesting—we can spend more time on truly interesting things.

Personally, I used to hate code review; it really was one of my least favorite jobs. I remember my first job after graduating from college was at Quora. I was responsible for the Newsfeed, so the Newsfeed code basically belonged to me, and I became the main reviewer for the Newsfeed. That code was one of the most core parts of the entire system; almost everyone would touch it.

The result was that every morning when I logged in, I would see 20 to 30 code reviews, and my heart would sink directly: "Oh my god, I have to go through all of these." I would often procrastinate, and then the pending PRs would rise to 50. The volume of review was very large.

Codex is really strong in code review. We observed a phenomenon: 5.2 (this generation of GPT-5.2) is particularly good at reviewing code, especially when you can guide it in the right direction.

So although the volume of PRs has indeed increased here, Codex goes through all PRs first, which turns code review from a task that originally took 10-15 minutes into a task that can often be done in 2-3 minutes, because it has already "baked" a bunch of suggestions inside.

Many times, especially for some small PRs, you may not even need to pull someone else in to review. To some extent, we trust Codex. Because the core value of code review is having a "second pair of eyes" confirm you haven't done anything stupid—and now Codex is already a fairly smart second pair of eyes, so we are leaning in very hard on this point.

Additionally, our internal CI processes, and the process from push to deployment, have also been largely automated through Codex.

If you ask many engineers what annoys them the most, it's often not writing the code itself, but: after you write a beautiful piece of code, how do you get it into production. You have to run a bunch of tests, handle lint errors, go through code review... there is a lot of procedural work in there.

These things are actually very suitable for Codex to do, so we internally also made some tools to automate these steps, such as automatically handling lint: if a lint error appears, Codex can usually fix it easily; it can directly patch it and then restart the CI process.

What we are doing overall is trying to compress the "manual operations" engineers need to invest to the minimum. The side effect (which is actually a benefit) is: everyone can now merge more PRs and release more code.

Host: Codex is writing code, and Codex is also reviewing the code it wrote. I'm curious, would you consider using other models to review your model's work? Is this a direction? Or is what we have now good enough, and we don't need anything else?

Sherwin Wu: I would say there is indeed a risk of a "loop" here. Returning to the "Sorcerer's Apprentice" metaphor, you have to ensure you haven't let the broom run out of control and run wildly around the room.

So we are actually very cautious about "which PRs can be completely reviewed by Codex." Most people will still take a look at their own PRs of course; it's not to say that human review is completely zeroed out.

A more accurate description is: reducing a person's attention to a PR from 100% to 30%. This allows things to move forward more smoothly.

As for the "multiple models" question, we certainly test many models internally, so we have a large number of different versions on hand. But we relatively rarely use external models—because we believe "eating our own dog food" is important; we need to use our own models to do actual work to obtain real feedback.

Of course, you can also use some different variant models internally to get another perspective; we found this method quite effective as well.

Host: Let me confirm my understanding of OpenAI's current "AI + Code" status again; after confirming, I want to switch to another topic. You are saying that now all of OpenAI's code is 100% written by Codex? Is this statement correct?

Sherwin Wu: I wouldn't directly say "all code running in production today is written by AI." I wouldn't conclude that, because it's hard to be that precise in attribution.

But what is certain is: almost every engineer uses Codex very heavily in all tasks. If you ask me to estimate a rough proportion, I would say: now the vast majority of code, the original author is likely AI.

In the AI Era, Where Does the Manager's Leverage Lie?

Host: Much discussion focuses on the role change of IC (Individual Contributor) engineers, but there is much less discussion about the change of "managers," especially the role of engineering managers. After the rise of AI, how has your life as a manager changed? What do you think the role of a manager will be in the future?

Sherwin Wu: Its change is indeed not as great as that of engineers. At least there isn't a "Codex specifically for managers" yet. However, I do use Codex to handle some of my more "management-oriented" work. I would say the change isn't that drastic yet, but I can see some trends. If you extrapolate these trends, you can roughly see where many things are going.

One increasingly obvious point is: Codex makes top performers much more efficient. I think this might also be a general rule of AI in a broader scope: those who are truly willing to deeply embrace it, those with strong initiative, or those who can use these tools very skillfully, will "super-accelerate" themselves.

I can also clearly feel now: top performers in the team will become more productive, and thus team productivity will show greater differentiation and span.

One of my management philosophies has always been: I will spend most of my time on top performers—ensuring they don't get stuck, ensuring they are happy, ensuring they feel they are advancing efficiently, and also feeling their voices are heard.

I think in the AI era, this will become even more important, because top performers will use these tools to run faster and harder.

For example, the team mentioned earlier: maintaining a codebase 100% generated by Codex. Letting them go ahead and do it to see what happens is actually very rewarding. So a trend I see is: for managers, in the future, they may more frequently and extensively invest time in top performers.

Another trend is: AI tools available to managers will increase the manager's leverage. Not at the code-writing level, but something like a "ChatGPT connected with organizational knowledge"—it can help you do research and understand organizational context. To give a very realistic example: we are currently doing performance reviews. You can easily use a ChatGPT connected to internal knowledge—it connects to GitHub, Notion documents, Google Docs—to let it quickly form a complete understanding of what someone has done in the past 12 months, and then write a small "in-depth research report" for you.

My intuition is that in this world, managers can manage larger teams. Just as engineers are now managing 20-30 Codex threads, these tools will also make "people managing people" management higher leverage.

Currently, the so-called best practice in engineering teams is that a manager usually leads 6-8 people. But I think this might change in the future.

You can already see similar phenomena in non-engineering fields like customer service and operations: in the past, support team sizes were limited, but when you can hand over more work to agents, you can do more things and also manage more people.

I think people management in tech companies might undergo similar changes. We are already seeing some teams: some EMs manage quite a few people, but they can still manage very well, because tools allow them to understand what the team is doing and understand organizational context with higher leverage, and operate accordingly.

Host: I really like your suggestion here: you have always tended to invest more time in top performers, helping them clear obstacles and ensuring they are happy. Marc Andreessen (famous VC founder) was also on this podcast recently; his statement was: AI will make good people better, and great people exceptional.

Sherwin Wu: Yes, yes. The point you mentioned is: in the future, this might need to be done more and more extremely—spend more time on the strongest people in the team, ensuring they have all the resources they need.

A very good example I have now is: internally there is a small group of engineers who are really very "Codex-ified"; they are very seriously pondering "what exactly are the best practices for interacting with this model." This is an extremely high-leverage thing.

As a manager, I just say directly: you go explore. Whatever best practices you summarize, we must share them with the entire organization. We will hold various knowledge sharing sessions and synchronize documents and best practices everywhere.

This kind of thing will lift everyone up together. I also see it as another example of this trend: top performers will become more exceptional.

Software and Startups Are Entering a New Stage

Host: People have a feeling: this thing is huge, AI is changing the world, the concept of the "one-person billion-dollar company" is changing many things; it will be a big deal. What changes do you think people haven't really factored in yet? That is, how will the future go, are there any examples you think we haven't realized but are actually critical?

Sherwin Wu: One of my favorite sayings in this wave of AI is "one-person billion-dollar company." I remember it was probably Sam who first said this concept (at least one of the first to say it out loud). It's really intriguing: if a person's leverage becomes high enough, at some point, a "one-person billion-dollar company" could indeed appear.

This itself is of course cool, but I think people haven't really factored in its second-order and third-order effects yet.

Because the implication behind "one-person billion-dollar company" is: with the help of these tools, one person can have stronger initiative and higher leverage, so he can easily handle everything a company needs to do, eventually making something worth a billion dollars. But this is just one level. It has other meanings.

One second-order effect is: if one person can achieve a "one-person billion-dollar company," it also means—starting a business as a whole will become much easier. I actually think this will trigger a huge "startup boom," especially a small-scale startup boom of the SMB (Small and Medium Business) style: almost anyone can make software for any need.

You can already see a hint of this in the AI startup circle: software is becoming more "vertical." That is, making an AI tool for a specific industry/vertical is often very effective, because you can understand the actual scenarios and use cases of that field more deeply.

If we continue to push the evolution of AI forward, I don't see any reason why there won't be 100 times the number of such startups.

So a world I envision is: to support a "one-person billion-dollar company," there might appear hundreds of small startups specializing in highly customized, very fit-for-purpose "bespoke software" to support these companies.

This will take us into a potentially very interesting stage: we might really enter a golden age of B2B SaaS, or more broadly, a golden age of software and startups. Because as writing software becomes easier and easier, and running a company becomes easier and easier, what you will eventually see is likely not "only one one-person unicorn," but—maybe there will be a "one-person billion-dollar company," but at the same time there will also be one hundred one-hundred-million-dollar companies, and there will also be tens of thousands of ten-million-dollar companies.

And for individuals, a ten-million-dollar business is actually already very good—that basically means "life is set for this lifetime." So I think we might see an explosive growth in this direction, and many people haven't really factored this point in yet.

Going down one more level—considered a third-order effect— of course, the further we push, the greater the uncertainty, but if we really move towards such a world: everywhere there are these "micro-companies," making software that might only serve one or two people, and the company is owned and operated by just one or two people.

Then the entire startup ecosystem will change, and the VC ecosystem will also change.

We might enter a world: only a few super big players provide platforms, and then on the platforms, a large number of small companies are supported and lifted.

But at the same time, those projects that truly fit the "venture capital scale"—projects that can multiply your investment by 100 or 1000 times—might actually become fewer. Because what will appear more often will be a large number of 10 million to 50 million dollar companies: they are very great for individuals, but not necessarily an ideal return structure for VCs.

These companies will be very suitable for those with extremely strong initiative—they deeply embrace AI and build businesses for themselves.

Host: I love how we've chatted our way to which-order effects. Now I want to hear the fourth-order effect, Sherwin—just kidding.

Sherwin Wu: I really can't; the fourth order is too "mega-brain"; I can't think that far.

Host: It's like "Inception"; every layer you go down, time slows down, and things get more complicated. But speaking of the "one-person billion-dollar company," I do think about this problem often. Because what I do cannot become a billion-dollar company; it completely doesn't fit the VC scale, nor is it particularly high leverage.

But I think of a realistic problem: I receive too many support tickets every day, and often they are particularly outrageous, particularly trivial things. Just the "support cost" part alone makes it hard for me to imagine how one person can support a billion-dollar scale. So I am actually cautious, even pessimistic, about the "one-person billion-dollar company" thing. I want to share this viewpoint, the core of which is: support costs are too hard to scale. Even if AI can help you partway, at a billion-dollar scale, unless your ACV is very high and customers are few, just handling support and various human communications makes it hard to expand.

In my own experience, many users can actually solve problems themselves, but they still choose to send an email to the support mailbox to ask a small question. Handling these things is very hard to scale. So unless you hire a bunch of contractors—but does that still count as a "one-person company"?—otherwise I think it's almost impossible to grow a company to a billion dollars without anyone helping you handle at least support work. AI can only help to a certain extent.

Sherwin Wu: I agree with the problem you mentioned. It's just that my view on "how it will happen" is slightly different.

I even think, Lenny, your podcast might become a billion-dollar business in the future. But the way it happens might not be: you personally dispatch AI to handle support tickets one by one, fix problems, and reply to emails.

What's more likely to happen is: a bunch of other startups will appear, specializing in software that fits your needs very well, and it's the highly customized, extremely vertical kind. For example, there might be 10 or 20 startups making support software specifically for podcast and newsletter businesses. They themselves might be "one-person companies" and don't necessarily need to be very big.

Because in this world, making a product will become very easy. They can make the product very fitting, very unique, really useful to you, and then you will be willing to pay for it—as that "high-leverage one-person company," you buy these tools to outsource the things you don't want to do.

Host: I will buy it; I really will buy it.

Sherwin Wu: Yes, there is a key question here: which things do you keep in-house, and which do you outsource?

I think what might happen is: because the cost of writing software and making products is collapsing extremely fast, you will outsource more things. Thus, you can actually press the company size even smaller.

This is the world I think might appear. Of course, there is still high uncertainty here, but the final form might still be: a company driven by one person, with extremely high leverage, really has the opportunity to reach a billion-dollar scale.

Host: I can understand. I also think of Peter (from OpenClaw); he is now completely submerged by various demands, emails, private messages, DMs, and PRs. And he hasn't even made money from this yet. I really find it hard to imagine what his life is like now—it must be very crazy. This is probably like the craziness you had in those few months after releasing ChatGPT, but he is carrying it alone. Maybe the fourth-order effect is: distribution/reach will become more and more important. Because there are too many things competing for your attention. Thus, those who own audiences and platforms will become more and more valuable—this is quite interesting.

Host: Okay, I actually want to return to the management topic you just mentioned. I really like that insight of yours: you said spending more time on top performers is very effective for you. The team you are currently leading is working on a platform, and this platform basically drives the entire AI economy—almost every AI startup is using your API. Obviously, you are doing very well. Besides this one point, what other core management experiences do you have? What do you think is particularly important for you as a manager of an engineering team and people, and constitutes the key to your success?

Sherwin Wu: Many things I've learned, I'm not sure if they are particularly "exclusive to the OpenAI API team," or if they only apply to some of our enterprise products.

My management philosophy is indeed changing, but overall, it's more about maintaining consistency rather than a complete overhaul. One of the principles is the one I mentioned earlier: spend a lot of time on top performers. More specifically, I will spend over 50% of my time on the strongest part of the team, such as the top 10%, and do my best to empower them.

I often use a metaphor to understand this problem: viewing software engineers as "surgeons." This metaphor comes from a very old book, "The Mythical Man-Month." This book was written around the 70s; it was actually "predicting the future" inside. The book describes a possible world: software engineering will move towards a pattern—engineers are like the lead surgeon in an operating room; there is only one person in the operating room who actually wields the knife, and everyone else revolves around him to provide support: nurses, residents, fellows... The lead surgeon says "I need a scalpel," and someone hands the scalpel up; the lead surgeon says "I need this tool," and someone pushes the equipment over. Everyone is supporting that one person.

"The Mythical Man-Month" predicted software engineering would go in this direction. I don't think reality is exactly like this—software engineering is still more collaborative, not just one person working.

But I have always liked this metaphor and have always tried to apply it to my management style. That is: software engineering is not equivalent to surgery, but I hope my way of supporting team members can make them feel like that "lead surgeon"—they are pushing the most critical work forward, and my responsibility as a manager is to ensure they have a "support team" in hand, ensuring what they need is available at any time. Even if the so-called support team is actually just me, I hope to achieve this effect.

An example I often cite is: seeing obstacles around the corner in advance and unblocking people from organizational processes; this is extremely valuable.

And in the AI era, this is even more important. Because when engineers can刷 many PRs at once and deliver continuously at high frequency, what really limits progress and delivery speed often becomes organizational obstacles and process obstacles.

If you as a manager can "see one step further" and prepare the resources they need in advance—just like the lead surgeon needs a scalpel, and you have already prepared the scalpel—that is the ideal state. This is my understanding of management, especially engineering management. This metaphor has followed me and basically 贯穿 my entire career.

Host: I like this metaphor too much. I would even think, could AI also help with this: helping you "see around the corner." For example, predicting: which decision will this engineer get stuck on next, and we need to solve it in advance.

Sherwin Wu: I haven't tried it yet, but now I'm suddenly curious: if I ask a ChatGPT connected to internal company knowledge—for example, let it scan Notion documents, see where it was mentioned in Slack—and I ask it directly: "What active blockers does my team have now? What can I do to help them?" I really hadn't thought of this idea before, but you're right, you just gave me an insight.

Host: And furthermore, you could even ask: what do you predict will block this engineer or this team in the next few months? You were just talking about second and third-order effects; now I'm letting the model help you with second and third-order effects: predict next month's blockers in advance and solve them ahead of time.

Sherwin Wu: Yes, yes. We might have really dug up a good idea here.

Why Do So Many AI Deployments End Up with Negative ROI?

Host: Okay, I want to switch to the API and platform you work on. You deal with many companies: they are integrating your API, using your platform, and building products based on your tools. You told me before that you observed many companies' AI deployments actually have negative ROI. I think this is also a conclusion that many people vaguely believe from reading news and their own feelings, but you said you really see it happening on the frontline, which is very interesting. What exactly is going on? Where did they go wrong? What is the reality of AI deployment and ROI now?

Sherwin Wu: Let me clarify first: I am not "explicitly" seeing that quantifiable ROI data—this is actually hard to test. But just based on the way I observe some companies "going AI," I wouldn't be surprised if quite a few deployments end up with negative ROI. At the same time, I also noticed that outside the tech circle—for example, among many non-technical industry groups in the US—there is a very common emotion: AI is being forced in. And this sense of resistance itself is likely an external symptom of "negative ROI deployment."

There are probably a few typical problems I see.

First, I always return to an old question: Silicon Valley often forgets it lives in a bubble. Twitter is a bubble—sorry, now called X—Silicon Valley is a bubble, and software engineering is also a bubble. The vast majority of people in the world, the vast majority of people in the US, are not software engineers. They are not so "AI pilled" (deeply baptized by AI), nor do they track every model release. Many people actually don't know how to use this technology at all, or even have little concept of how it works.

You see, internally at OpenAI, we talk a lot about Codex best practices, and there is even a group of people specifically researching how to use Codex most effectively. Those who often post on X are also almost crazy power users of AI tools: skills, agents.md, MCP... they are all very proficient in these.

But when I go and talk to many companies, especially when talking to frontline employees who really need to use tools in their daily work, you find their needs are very basic, and their understanding of this technology is also very limited. The questions they ask are very simple, far from "pushing the tool to the limit."

This also leads to what I think should be a more ideal way of AI deployment—and also how we generally operate internally at OpenAI: those companies that "run smoothly" often possess two things simultaneously.

First, there is top-down buy-in. Senior leadership clearly states: we want to become an AI-first company. Thus resources will be invested, tools will be purchased, and the organization will give clear support.

But the second is equally critical: there must be bottom-up adoption and buy-in. That is, those frontline employees who actually do the work must feel excited about this technology, be willing to learn, be willing to evangelize, be willing to summarize best practices, and be willing to do knowledge sharing within the organization.

We also went through a similar process internally at OpenAI. OpenAI has always hoped to be AI-centric, but what really made this thing "take off" was after tools like Codex appeared—because employees could finally use it directly in specific work.

The reason you need bottom-up promotion is because everyone's work is different and very specific. Software engineering is not equal to finance, not equal to operations, and not equal to marketing and sales. When landing at the work level, there will be a large number of "last mile" details that must rely on frontline people to try, polish, and modify workflows.

And the reason many AI deployments fail is precisely because of the lack of bottom-up adoption: it's more like an order from senior leadership, too top-down, and disconnected from how real work is done. The result is that facing a whole large group of employees, they don't truly understand this technology; they only know "I should use it," and even performance reviews write "you must use AI to improve productivity," but no one tells them specifically how to use it.

They look around and find no one else is really using it: no one to learn from, no path to copy, so they get stuck in place.

So my advice to those companies wanting to promote AI is: find—or even specifically equip—a small full-time team as an internal tiger team. This team is responsible for figuring out capabilities, landing them on specific workflows, doing continuous knowledge sharing, creating excitement within the organization, and getting more people willing to try. Without this mechanism, AI is really hard to be "picked up and used."

Host: Then who would you put into this tiger team? Should it be engineer-led? Or do you think it's more like a cross-functional team?

Sherwin Wu: This is an interesting question. Because the reality is: many companies simply don't have software engineers. So a more common pattern I see is—the core members of the tiger team often come from positions "adjacent to software engineering": technical-oriented, but not necessarily engineers.

These people are actually the easiest to get excited first. For example, support team or operations leaders: he doesn't write code, but loves tinkering with tools, maybe also an Excel expert, a process expert. You will find that once this kind of person contacts AI tools, they often "light up"—quick to get started, full of motivation, and willing to actively summarize usage methods.

So the typical portrait of such a tiger team is: technically adjacent, coding-adjacent, overall technical ability is not weak, willing to try, willing to learn, willing to lead others. You can usually build a small team with them as the core.

Of course, engineers joining would be very helpful; they can understand underlying mechanisms faster and are also better at systematic implementation. But many companies don't have this condition: engineers are scarce resources, hard to recruit and expensive. So many times, what really pushes AI up is actually these "non-engineer but technical-oriented" roles.

Host: From what I hear, the anti-pattern you are talking about is: top-down. For example, the CEO and executive team decide: we want to be AI-first, we want to fully embrace AI. Everyone will be assessed: how much productivity have you improved using AI tools. But if there is only top-down, without establishing a bottom-up "dissemination and driving" team, then this thing won't work.

Sherwin Wu: Yes, exactly like that. The core advice is: find those who are most excited. Instead of scattering them around the organization, gather them together to form a small "AI missionary team." They go explore how to use it, how to land it, and then spread the usage to the entire organization. You paraphrasing me like this, I suddenly realized: this can also align with my own management philosophy. In other words: find high performers in AI adoption, and then empower them—let them hold hackathons, let them do sharing sessions, let them do knowledge sharing, planting seeds of excitement internally.

From Vector Libraries to Skills: Scaffolding Is Being Eaten Layer by Layer

Host: I have a few "hot takes" I want to hear you expand on. One I see you mention often: you say in the AI field, "going to talk to customers, listening to customers" is not always the right strategy, and it can often lead you astray.

Sherwin Wu: I'm not sure if this counts as very "hot." What I want to say is not "shouldn't talk to customers"—of course you should talk, and it's very valuable.

What I want to emphasize more is: the iteration speed in the AI field (especially the changes I've seen on the API side in the past three years) is really too fast. Models and the entire ecosystem will constantly disrupt themselves, especially at the toolchain and scaffolding layer.

I just saw a sentence this week from an article on X by Nicholas, the founder of a startup called Finol. He shared quite a bit of practical experience making AI agents in financial service scenarios (I remember he also worked in a similar direction at a company called FinTool before). There's a sentence of his I really like: "Models will eat your scaffolding for breakfast."

Looking back at when ChatGPT was first released in 2022, models were still very rough. So a large number of "scaffolding-style products" emerged in the developer tool circle to constrain and guide models to work in the way you expect: various agent frameworks, vector databases... Vector libraries were especially hot then, and a large circle of supporting tools grew around them.

But looking back over these past few years, models have become too fast and too strong, and as a result, they really will "eat" some of that scaffolding. I think this still holds true today. The "current trendy scaffolding" mentioned in Nicholas's article is context management based on skills files. You can completely imagine a world: at some point in the future, this set of things will no longer be useful because models can manage this context themselves; or the entire paradigm will switch to another direction, no longer needing this file-based skills.

You have witnessed this happen with your own eyes: agent frameworks are not so useful now; in 2023, we once thought vector libraries would be the "main path" for introducing organizational knowledge into models—you needed to embed all corpora, do vector retrieval, and do a lot of optimization to ensure the right information was retrieved at the right time.

That whole set is essentially scaffolding because models weren't strong enough at the time. And when models become stronger, the better way is often: remove a lot of logic, trust the model itself, and just give it a set of tools for searching.

This search doesn't necessarily have to be a vector library; it can connect to any form of search—it can even just be files in the file system, like skills, agents.md, etc., to guide it.

Of course, vector libraries still have their place, and many companies are still using them. But the assumption of "building an entire scaffolding ecosystem around vector libraries and treating it as the only answer" has changed a lot.

So returning to "customer feedback": you don't necessarily always have to listen to customers, because this field changes too fast. Many customers are actually in a "local optimum" at a certain point in time.

If you only blindly listen to customers, they will say: I want a better vector library, I want a better agent framework... But if you only walk down this path, you might make a "locally optimal" product; and when model capabilities take another step up, we often need to reinvent and rethink: what is the correct abstraction, the correct tool, the correct framework. And what is more interesting, more exciting, and also a bit maddening is: this is a moving target.

The combination of tools and frameworks you think is "correct" today will likely continue to evolve and undergo major changes in the future, as models become smarter and stronger. This is the essence of making products in this field. This is also where it's exciting. But it also means: when you talk to customers, you need to balance between "what they want right now" and "where you think models are going, how they will evolve in the next one to two years."

Host: This sounds very much like the so-called "bitter lesson": an important lesson in the AI/ML field is—the more complex logic you add, the more manual design, the more you limit its scalable growth. You should remove these things as much as possible, let it compute, let it become stronger on its own.

Sherwin Wu: Yes, there is indeed a version of "applying the bitter lesson to AI product building" here. We once tried to architect many things around the model, and as a result, after model capabilities improved, it would eat these things directly. Honestly, the OpenAI API team made this mistake at times too: we took some detours we shouldn't have. But models will still become stronger, and we can only continuously learn this bitter lesson in our daily lives.

Host: Then for those building products and building agents using the API, what is the key takeaway? Because they still have to build something around current capabilities. What advice would you give?

Sherwin Wu: My overall advice to everyone—which I still think holds true today—is: build for where the model is going, not just for what the model can do today.

Because the goal is essentially a moving target. I've seen quite a few startups that do particularly well design their products around an "ideal capability": this capability may only be 80% realized at present. So their products are of course "usable" now, but always feel like they are missing that last breath.

But once model capabilities take another step forward, the experience will suddenly "click" and be unlocked: the missing breath is filled, and the product as a whole goes from "barely usable" to "very stunning."

For example, a certain key capability wasn't stable enough in the o3 era, but suddenly became usable in 5.1, 5.2—the reason they could eat this bonus is because they wrote "models will inevitably become stronger" into the roadmap during product design. Ultimately, you will get an experience: it is far better than the approach of treating model capabilities as static and patching around the current situation.

So my advice is simple: design according to the future direction of the model. You might need to wait a little, but models are becoming stronger so fast that often you don't need to wait too long.

Host: Following up on this topic, can you share where the API will go in the next 6-12 months? Where will the platform go? Where will the models go? I know much of this content might be confidential, but you can share as much as you can—what you are most excited about, what you think everyone should start preparing for.

Sherwin Wu: One of the most obvious directions is: the duration for which models can complete tasks continuously and stably is getting longer.

There is a benchmark indicator I think is very reference-worthy (the meter benchmark he mentioned), used to track how long models can run stably in software engineering tasks—for example, how long they can last at a 50% success rate, and how long they can achieve at an 80% success rate.

From what I recall, current frontier models are roughly: at a 50% success rate, they can already complete "hour-long" tasks; but if the threshold is raised to an 80% success rate, it might still stay at the level of "close to 1 hour, but not quite." The most sobering thing about this benchmark indicator is: it puts all generations of models on the same timeline, and you can very intuitively see how the trend is pushed forward step by step.

What excites me is: today many products are still optimizing around "models can run for a few minutes." Even for coding tools like Codex, you will find it is more interactive, more like an on-call collaborative partner—what it is best at, and what is most optimized, is often still tasks of about ten minutes.

Of course, I have also seen people push Codex to the limit, using it to run hour-long tasks, but that is still a minority case, not the norm.

If we continue to push this trend forward, I believe in the next 12-18 months, we will see models complete "multi-hour tasks" more stably and coherently. There might even appear a stage like this: you hand over a task of about 6 hours to it, let it run for a while by itself, and then come back to give you results and progress.

Once capabilities reach this level, the product forms built around it will be completely different. You still need to give feedback to the model, and you certainly don't want it to run unconstrained for a whole day—maybe someone will want to do this, but in most scenarios, they won't. And when task duration truly lengthens, the scope of work the model can cover will suddenly become much larger, and the "universe" of things it can do will expand accordingly. This is also the point I am most excited about.

Another direction I think will be very cool in the next 12-18 months is advancements in multimodal models. More specifically, I mainly refer to audio.

Models are already pretty good on audio now, but I think in the next 6-12 months, they will become stronger—especially those native multimodal, speech-to-speech models. At the same time, there might also be some new model structures and architectural directions appearing on the audio side. And audio in enterprise and business scenarios is still a severely underestimated field: everyone is talking about coding, everyone is talking about text, but we are now conversing using audio. Many businesses in the world are done by "speaking"; many services and operations are also done through communication.

So I think in the next 12-18 months, audio will become very exciting, and we will see more "unlocked" capabilities.

Host: Let me quickly summarize: you think agents and AI tools will increasingly be able to run longer tasks, and this trend will continue to strengthen; then audio and speech will become more important, more native, more core, and the experience will be better.

Host: Back to your "hot take" just now. I also see you often say another one: you are very bullish on the direction of "business process automation," thinking it will be a huge opportunity in the AI world. Let's talk about this?

Sherwin Wu: Yes, this actually returns to what I said earlier: we live in a bubble in Silicon Valley. The work forms we are familiar with—software engineering, product management, making products—are actually completely different from the large number of work forms that support the entire economy's operation. I can strongly feel this when talking to customers: if you go talk to any non-tech company, you will find they have a huge amount of "business processes."

I usually distinguish it this way: software engineering is more like a kind of open-ended knowledge work. This is also why tools like Codex are strong, because they are good at exploration; you give them open-ended questions.

But the essence of software engineering is very open, and it is not "repeatable." You make a feature not to repeatedly make the same feature over and over again. Many tech jobs belong to this kind of open-ended work: data science is a bit like it, and even some strategic finance work is a bit like it.

But when you get further and further away from software engineering, away from the "tech company core," you will find many jobs are actually business processes: repeatable things, repeatable operational actions. It is often a set of practices iterated over a long time by a company's managers; there are usually standard operating procedures (SOPs). Everyone hopes to follow the SOP, and doesn't want to deviate too much.

The "intelligence" of software engineering often lies in innovation, deviation, and exploration; but the essence of a large amount of work in the world is actually just running according to these processes.

For example, when I call customer service, the other party is following a set of processes; when I call the water, electricity, and gas company, they also have many processes and rules: what can be done, what cannot be done. So I am very bullish on this major category of opportunities: using AI to do business process automation. And I think it is underestimated because it is too different from what Silicon Valley talks about daily, so people rarely think about it.

But if you think: can we use AI, use our existing tools and frameworks, to automate these repeatable, highly deterministic business processes? Can we make it more labor-saving and smoother? The key is also: it must be deeply integrated with enterprise data, enterprise decision logic, and various internal systems of the enterprise. I think this opportunity is huge, and there is also a lot of work to do; it's just that we don't talk about it much because it's not in our "comfort zone."

Host: Let me confirm if I understand correctly: you think AI has greater opportunities "outside of engineering"—it can more significantly affect company productivity, affect a large number of people engaged in repeatable, easy-to-automate work, and even change the way work is organized. Because in reality, much work is completed this way.

Sherwin Wu: Yes. I often talk to many large enterprise clients: how will AI change my company in 20 years? How will the company operate in the AI world?

Software engineering is of course part of the story, but there is more on the business process side. And I think the business process side might eventually present a more "thoroughly different" appearance, and the amount of work to be done is also very large.

In terms of absolute scale, I'm not sure if it's actually larger or smaller than software engineering—software itself is also very huge and covers a very wide range. But what is certain is: this part is really huge, and it is far larger than the discussion heat you see on X/Twitter. Many people don't talk about it at all, so you will underestimate it.

How to Avoid Being "Crushed" by OpenAI?

Host: Changing direction. You do the platform, do the API; many people make products on the API. The biggest question in everyone's mind is always: how can I avoid being "crushed" by OpenAI? Will you do the same thing yourself, and then destroy the market I just built? What is your overall policy, overall philosophy? How should startups judge: which directions is OpenAI unlikely to enter personally?

Sherwin Wu: My overall answer is: the market is too big; the opportunity space is ridiculously huge. Startups really don't need to overly worry about what OpenAI or other large model labs will do.

I've seen many startups, some doing poorly, some doing very well. Of all the companies I've seen "flame out," not a single one was because OpenAI, some large lab, Google, etc. "came to crush them." The reason for their failure is simpler: the thing they made didn't truly resonate with customers, didn't resonate with customer needs.

Conversely, those companies that took off can make it even in extremely competitive fields. For example, the coding field is competitive enough, but Cursor is still very big now—because they made something people really like.

So my advice is: don't be too anxious about this. Focus on making a product users like, and you will definitely find space in it.

I can't emphasize this enough: how big the AI opportunity is now. The opportunity is so big that even the VC's "acceptable range" (Overton window) has been changed—VCs are now investing in "competing companies" in the same track very frequently and very aggressively, precisely because the space is too big, the opportunity is too big, almost unprecedented.

From an entrepreneur's perspective, this is actually the most empowering environment: as long as you make something that a group of people very, very much like, you can make a huge business. That's why I keep saying: don't overthink "will I be crushed."

There is also another very important point, at least from OpenAI's perspective: we have always attached great importance to one thing, which Sam and Greg constantly emphasize from the top—we fundamentally see ourselves as an "ecosystem platform company." The API is our first product. We believe we must cultivate this ecosystem, continuously support it, rather than destroy it.

You see many decisions we make; this logic runs through them: every model we release, every time we launch in a product, it will eventually enter the API. Even if the Codex models we are launching now are more oriented towards Codex harness optimization, they will eventually all go into the API, allowing all API customers to use them too.

We won't "hide these capabilities and not release them." We think maintaining platform neutrality is very important: we won't block competitors; we allow everyone to access our models. We are also recently testing products like "Log in with ChatGPT"; we hope to continue to grow this ecosystem—this is very important. The overall logic is: a rising tide lifts all boats. We may be like an aircraft carrier now, with a huge volume, but we believe raising the overall "tide level" is good for everyone, and we will benefit ourselves too.

The growth of our API, to some extent, is because we have always acted in this way. So I really encourage everyone not to think of OpenAI as an existence that will push you away and squeeze you out at any time. You should focus your attention on: making something truly valuable. We will continue to strive to provide an open ecosystem.

Host: Why is this important for OpenAI? Is this persistence of "being a platform, letting others do business" a vision that existed from the beginning?

Sherwin Wu: Yes, this has existed from the beginning. It can even be traced back to our charter, our mission.

OpenAI's mission has always been two things: first, build AGI. We are certainly doing this. Second, spread the benefits to all of humanity. The key is "all of humanity." ChatGPT is certainly doing this; we want to reach the whole world. But very early on, we realized: relying solely on OpenAI as a company, we cannot reach every corner of the world. The world is too big; the needs in every corner are very deep and very detailed.

So to complete the mission, we must be a platform: to empower others to build those things we cannot possibly do ourselves personally—for example, the "customer service bot for podcast and newsletter hosts" product you just mentioned; we won't do it ourselves, but others can build it on the platform. This is the meaning of the API. We have also always liked seeing various things emerging in the ecosystem, so from day one, this has been a manifestation of the mission.

Host: And you haven't even mentioned the ChatGPT "app store" you are about to launch. Is this within your scope, or another organization/team?

Sherwin Wu: That is another team, more on the ChatGPT system side. But we cooperate with them very closely. They made an apps SDK, which was also developed in close collaboration with our team. But it is indeed under the ChatGPT umbrella.

But it is also an example of the same logic: ChatGPT now has about 800 million weekly active users; these users come back repeatedly to use it. For the business, this is a very strong asset. But if we can let other companies come in too, utilize this entry point, and build products for this crowd—isn't that better? Ultimately, we also believe this will help us continue to grow this user base. So it still returns to the mission: being a platform, staying open, can often bring greater growth.

Host: The 800 million figure you just mentioned... is that 800 million weekly active? My brain just stalled for a moment.

Sherwin Wu: 800 million weekly active.

Host: This is too exaggerated; it's simply unprecedented. We have already become numb to this scale of numbers, but it's really outrageous.

Sherwin Wu: Yes, thinking about this from a scale perspective, I also find it very shocking. I would understand it this way: it's about 10% of the world's population, and it's still growing. It's still charging upwards. 10% of the world's population comes to use ChatGPT every week (accurately speaking, weekly).

Host: I also want to re-emphasize the point you just made: OpenAI's mission is to let the benefits of AI reach all of humanity. Some people will mock this sentence, saying "isn't this just about charging money?" But the reality is: anyone can use the free ChatGPT. The capabilities of the free version are not "worlds apart" from the world's strongest AI models; it's not strictly blocked by thresholds for use by only a few people. If you are a billionaire, the increment you can get from AI is actually limited; and a person in some village in Africa, as long as they can access the internet, the AI capabilities they can get won't be much worse. I know this has always been something OpenAI cares about very much.

Sherwin Wu: Yes, this is also why we attach great importance to healthcare, and great importance to education—the education part will be very interesting.

There is also a very crazy trend: free models themselves are becoming smarter and smarter. Looking back at the free models of 2022, they were already considered good at the time, but compared to today, they are not on the same level at all. What you get today is GPT-5 (he mentioned "2 GB 5" here, I understand semantically as GPT-5 level free capabilities)—so the so-called "raising the global floor" thing is part of our mission.

Additionally, from that "billionaire" angle, there is another interesting comparison: some say the iPhone you use might be the same model as the one Zuckerberg or those billionaires use. And now to some extent it's similar: for 20 dollars a month, you can use "the same AI that billionaires use." For 200 dollars a month, you can go Pro—"the Pro that billionaires use." But they don't necessarily use Pro entirely in their daily lives; many times it's just Plus level.

So this kind of "democratization," this spreading of benefits to the whole world, is very meaningful to us and drives many of our decisions.

Host: Last question: for those who want to build something on the API—maybe they suddenly realized "I can also use open-source models and APIs to make very cool things"—what exactly does your API and platform allow everyone to do? I know you can build agents on the platform. Can you generally talk about what capabilities you provide?

Sherwin Wu: Fundamentally, the API provides a set of developer endpoints that allow you to sample from our models.

The most popular endpoint now is called the Responses API. It is an endpoint specifically optimized for building "long-running agents"—that is, agents that can work for a period of time.

At the lowest-level primitive, you basically give the model a piece of text, let the model work for a while; you can poll it to see what it is doing; and then at some point get the model's return. This is the lowest-level primitive we give developers, and also the most common way many people build. It is very "unopinionated": you can almost do anything with it; it is the most underlying building block. On top of this, we are starting to provide more and more abstraction layers to help everyone build these things more easily.

One level up, we have a very popular thing called the Agents SDK. It allows you to build more traditional agents based on the Responses API or other endpoints: for example, an AI running continuously in an approximate "infinite loop" workflow; it might have sub-agents and can delegate tasks out.

It will help you set up a whole set of frameworks/scaffolding—of course, whether this scaffolding will also be "eaten" by models in the future, we will continue to observe. But for now, it indeed makes building agents much easier: you can give it guardrails, let it distribute sub-tasks to other agents, and orchestrate an agent swarm. The Agents SDK is to help you do these things.

Then further up, we are also starting to do some more "meta-level" tools. We have a product called Agent Kit and some Widgets: essentially a set of UI components that allow you to quickly make a very beautiful interface on top of the API or Agents SDK. Because many agents look very similar from a UI perspective, providing a set of components can greatly accelerate productization.

In addition, we also have some evaluation-related products, such as the Eval API: if you want to test models, test whether your agents or workflows are effective, you can use our eval product to do relatively quantitative tests.

So I would understand it as a layered stack: different levels help you build what you want with our models; the abstraction levels are getting higher and higher, and also more "opinionated." You can either use the whole set of stacks and quickly make an agent, or you can sink all the way to the bottom and only use the Responses API to build everything you want yourself.

Lightning Round

Host: Sherwin, before we enter the very exciting lightning round, is there anything else you want to add? Is there anything you want to leave for the listeners? Is there any point we haven't talked about yet but you think is very helpful?

Sherwin Wu: I just want to leave one message: I think the next two to three years will be the most interesting period for the tech circle and the startup circle in a long time.

I encourage everyone not to take it for granted. I entered the workforce in 2014; the first two years were pretty good, but for the next five or six years, I felt the tech circle wasn't so "exciting." And the past three years have been the most exciting and energetic stage of my career.

I think the next two to three years will continue this state.

So don't take it for granted. One day this wave will finish, and changes will become more incremental and less drastic.

But during this period, we will explore many cool things, invent many new things, change the world, and also change the way we work. This is what I want to leave with everyone.

Host: I like this paragraph too much; I want to ask one more question. You said "don't miss out," so what exactly do you suggest people do? Is it to build, to embrace, to learn, to join a company doing interesting things? What advice do you give to those who want to say "I don't want to miss this train"?

Sherwin Wu: I would say: engage with it.

Basically, it's what you said: embrace it. Building tools on top of this is part of the story. But even if you are not a software engineer, you can completely embrace it: go use these tools.

I think many jobs will be changed. So you should go use tools, understand what it can do, what it cannot do, understand its limitations, so that you can see what it will start to be able to do as models improve.

In short: make yourself familiar with this technology, rather than leaning back and letting it pass you by.

Host: But on the contrary, there is also a lot of pressure and anxiety: there are too many things; how can I keep up? I have to learn Clawbot this week, and something else pops up next week... You are in the center position; how do you not get crushed by this "fear of missing out"? How do you keep the rhythm, how do you follow the news?

Sherwin Wu: Personally, I am actually a bad example because I basically belong to "always online": always online on X, always online on company Slack, so I do absorb a lot of information. But observing those who are not as "addicted" as me, I think one point is very important: most information is actually noise.

You don't need to let 110% of things pass through your brain. Honestly, you just need to choose one or two tools, start small, and that's completely enough.

The industry pace is too fast, plus the mechanism of the X product itself, creates an extremely crazy news rhythm, making people feel very oppressed and very easily overwhelmed.

But you really don't need to master all of this to participate in what is happening right now.

Even just installing a Codex client to play around; installing ChatGPT, connecting it to one or two of your internal data sources—Notion, Slack, GitHub—and seeing what it can do, what it cannot do, I think is already very valuable.

Host: Do you have a motto you often use to remind yourself?

Sherwin Wu: One sentence I always repeat to myself is: never feel sorry for yourself.

Many things happen in work and life. Reminding yourself not to fall into self-pity, but always believing you have initiative and can pull yourself up—this is what I often need to say to myself, and I often say this to others too.

Reference Link:

https://www.youtube.com/watch?v=B26CwKm5C1k

Disclaimer: This article is organized by AI Frontline and does not represent the platform's views. Reproduction without permission is prohibited.

OpenAI Frontline Development Reality Check: Engineers Who Can Monitor 10-20 Agents and Run Hour-Long Tasks Are Leaving Others Far Behind

Related Articles

分享網址