Former OpenAI Researcher: AGI Requires Models to Break Through Difficulties on Their Own; The Biggest Problem is Generalization; The Most Important Skill is "Managing Junior Engineers"; Robots Will Have a "ChatGPT Moment" in Two to Three Years

Editor | Listening to the Rain

Unbelievable, a former OpenAI researcher is speaking the truth!

Yesterday, the program "Unsupervised Learning" invited former OpenAI researcher Jerry Tworek as a guest. Jerry Tworek is the key architect behind OpenAI's reasoning models o1, o3, and Codex, and has participated in the most critical breakthroughs in the AI field over the past few years. He recently left OpenAI because he hopes to pursue research directions that are more difficult to conduct in large laboratories.

In this episode, Jerry discussed many recently hot topics: the real limitations and prospects of scaling in pre-training and reinforcement learning, expectations for the timeline to AGI, the convergence of research directions among major laboratories, OpenAI's key bets during the GPT-4 release, the standards for excellent AI researchers, and more.

Jerry stated that the benefits brought by the Scaling paradigm are stable and predictable, but whether it can ultimately lead to AGI depends on the model's generalization capability. He pointed out that there is an increasingly obvious phenomenon: we are becoming extremely good at "things we have explicitly trained for."

He also admitted that he was previously very optimistic about AGI, thinking that AGI could be achieved by continuously doing reinforcement learning. But after working on it for a while, his time expectations became more conservative. The biggest shift in his mindset is: if a model cannot break through on its own when encountering difficulties and cannot rescue itself from a "stuck" state, it is difficult to call it AGI.

Since joining in 2019, Jerry has worked at OpenAI for six or seven years. Starting from a small laboratory with only thirty or forty people, developing into one of the world's largest companies, Jerry described his experience at OpenAI as "truly crazy," and he initially believed that this was truly a place where AGI could be "built."

He said that in his memory, there were two key decisions at OpenAI: one was to concentrate a large amount of resources to train GPT-4, which was a decision accompanied by huge trade-offs, but it was extremely critical in OpenAI's history and proved to be an excellent choice in hindsight; the other was to bet on "reasoning models being the future," causing OpenAI to fully pivot in this direction and release o1 and o3.

Jerry revealed that OpenAI's research department has been highly separated from the product team from the very beginning. The company's core mission has always been "to build intelligence." For a company, it is usually only possible to do one extremely difficult thing to the extreme, and it is rare to do multiple extremely difficult things to the top level at the same time. He also agrees that Anthropic has achieved leadership in programming, and the key lies in the focus of the direction.

Jerry believes that the most important skill at present is actually the ability to "manage junior engineers." The best managers both deeply understand the system and can let others make decisions—this is precisely the best way to collaborate with models.

Jerry also shared a major cognition he recently updated: static models can never become AGI, continuous learning is essential. In addition, he believes that a "turning point moment" similar to ChatGPT in the robotics field will take about two to three years.

The following is the full text of the dialogue, with some cuts and polishing:

How far can the Scaling paradigm go:

The benefits of scaling are stable, the problem lies in generalization

Host:You promoted the introduction of reasoning models and the scaling of reinforcement learning at OpenAI. From the perspective of the existing scaling paradigm, I am curious about your judgment: how far can we go with the current pre-training + reinforcement learning? To what extent can this route ultimately take models?

Jerry:It will definitely take us somewhere. The question is, how should we name that "place"?

Host:You can define it.

Jerry:But for most practitioners, there is a very real and quite shocking fact: the benefits of scaling are real, predictable, and quite stable. Every time we expand the pre-training scale, we get a better pre-training model—they know more about the world, have a deeper understanding of language, and build a more complete "linguistic world model" around themselves. Similarly, expanding the scale of reinforcement learning will also make models better at acquiring the skills we want.

In both cases, you basically "get what you train for." If you want a model good at next-token prediction, you do large-scale pre-training, and you will get a model that is very, very strong at next-token prediction; if you want a specific set of skills, you do reinforcement learning on these skills, and the model will become very, very excellent at these tasks. In a sense, there is almost no obvious upper limit. Now everyone knows: as long as you care about a skill, you do reinforcement learning on it, and the model can learn it very well—things are basically that simple, and it really works. The truly hesitant or stuck place lies in: how is the generalization capability of these models? How do they perform outside the training distribution? For knowledge that does not exist in the pre-training corpus, can the model handle it? Probably not. For tasks you have never trained with reinforcement learning, how do they perform? Probably not very well either.

So, these are almost the remaining core problems in the current AI world: we are becoming extremely good at "things we have explicitly trained for."

Host:This seems to lead to two different views: one is that we have just begun to explore the potential of reinforcement learning, and as we continue to scale, generalization capabilities will naturally emerge gradually; these two scaling routes are enough to take us far. The other is that to continue breaking through, we may need to introduce a completely new paradigm. Which side do you lean towards?

Jerry:I think this is largely an economic issue. Obviously, "scaling" largely means adding data, and without data, scaling is almost impossible. If you continuously add data that you want the model to be good at, the model will become better at these things.

The phenomenon you see now is that almost every quarter, every laboratory releases a stronger model. This usually means three things: first, more computing power; second, more importantly, more data; third, and most critically, these data are carefully customized for the shortcomings of the previous generation model.

This is an extremely powerful methodology: by iterating continuously, you can train better and better models. From this perspective, if you continuously supplement data for "what you want the model to do," you can eventually get a model that performs well in all these things. But this cycle is very slow in some aspects. The real question is: is it possible to be faster? Under the existing training paradigm, I do believe that as long as you keep adding target data, the model will learn the corresponding skills and have a certain degree of generalization. But the key question is: are there other research directions that can obtain more capabilities with less data? Is there a more "fundamental" way to make the model better utilize what it has already seen and learned for generalization?

Host:We will return to these potential new directions later. First, help the audience establish a background: based on your own experience, where is reinforcement learning currently effective and where is it not? Many people mention the difference between "easy to verify" and "not easy to verify" tasks. What is your own mental model? What are the things that today's RL can truly do effectively?

Jerry:The issue of "easy to verify / difficult to verify" is essentially close to: can we obtain a meaningful quality signal? At OpenAI, we have made quite good progress in many aspects, allowing models to become better through reinforcement learning in various tasks. In fact, reinforcement learning can be used for many things.

But some things are inherently difficult to judge what is "good" and what is "not good," or you need to wait a very long time to get feedback. For example, writing a book: you can certainly use some simple methods to judge whether it is good or not, but the truly reliable signal may have to wait until it is published, to see how many people are willing to read and buy it. Even so, this signal is not always reliable—critics may unanimously think it is a masterpiece, but due to marketing failure, it cannot be sold at all.

So how do we do reinforcement learning for "writing a good book"? This is itself very difficult to answer. How do humans learn to write good books? This is also an extremely complex issue.

Entrepreneurship is a similar example. Many companies start in the early stages, how do we know which one is a "good company"? It often takes five or ten years to see clearly. Is a decision made by an entrepreneur in the early stage right or wrong? Or is success largely due to luck? In such scenarios, doing reinforcement learning directly is very, very difficult.

However, as long as you can get any form of feedback, you can in principle use it for reinforcement learning.

Host:The models you participated in have performed impressively in tasks such as programming competitions and math competitions, but people are still trying to establish intuition: are most tasks in the real world more like "programming and math," or more like "writing books, entrepreneurship"—tasks that are extremely difficult to build reward signals and difficult to try repeatedly? For example, accounting, medicine, law—intuitively, which category do you think they are closer to?

Jerry:Ultimately, it is still a question: how easy is it to judge "how well you did"? Even for humans, judging whether a book is well-written is inherently difficult.

If you are the manager of an accounting team, and there are clear rules in this field, you can relatively easily judge which accountant is doing well and which is not. As long as the rules are clear, you can use these rules to train almost any system.

Medicine is the same. I have thought a lot about surgeons recently: there are indeed clear rules and clear feedback signals there—whether the patient survives after surgery is itself a very strong success criterion. What is more interesting is: truly top doctors often violate existing rules at critical moments. They make judgments based on experience and must perform surgery in an unprecedented way. They break conventions, but the result is successful, saving the patient.

I think that with enough time and enough attempts, models can also do similar things. The real question is: how long will it take for models to truly reach this level?

Host:If we want reinforcement learning to have generalization capabilities in more tasks that humans care about, what do you think is the frontier problem that really needs to be tackled next?

Jerry:I think generalization is essentially an attribute of the model itself. During training, you really decide the training objective; in the end, what you get is basically what you optimized for. The question is: how much additional capability can you "incidentally" get?

Indeed, there are some learning methods—even for next-token prediction, they hardly generalize, such as nearest neighbor classification. Theoretically, it can solve any machine learning problem, but its generalization capability is extremely poor because the world representation it builds is extremely simple.

Neural networks, especially large-scale trained neural networks, are magical in that they learn very useful and very abstract world representations. Sometimes we even feel this is "free": why can a huge Transformer trained on the internet understand various concepts in the real world so deeply?

This generalization capability comes from the Transformer architecture, a large number of parameters, and repeated gradient descent. This itself is like a kind of magic. The question is: is there a different model that can generalize better? Almost certainly, the answer is yes. The real question is—what will it look like?

If a model cannot break through on its own when encountering difficulties, it is difficult to call it "AGI"

Host:I heard you mention before that after doing reinforcement learning scaling, your time expectations for AGI became more conservative. Why?

Jerry:I used to be a very optimistic person, thinking that as long as we did reinforcement learning on the model, we could reach AGI. Maybe we have already done it. Maybe it is already AGI—this is entirely a very subjective judgment. Because "what is AGI" often depends on what we are still missing.

Current models can already solve almost all Olympic-level problems, various competition problems. Even some unprecedented mathematical problems have started to be solved. You can see examples like GPT-5.2 every week.

So when will there be a moment when "everyone nods in agreement at the same time"? I don't know. I am a heavy user of programming models. They still make mistakes. They can help me complete work that would originally take a very long time, and are extremely powerful productivity amplifiers. But at the same time, there are obvious failure scenarios. When the model fails, you will quickly feel a sense of "powerlessness." You can repeatedly paste error messages, tell the model "this doesn't work, try another way," and sometimes even give it "spiritual encouragement." But essentially, the model does not have a mechanism to truly update its beliefs and internal knowledge after failure. This is probably the biggest change in my mindset: if a model cannot break through on its own when encountering difficulties and cannot rescue itself from a "stuck" state, I find it difficult to call it AGI.

True intelligence will keep trying, keep probing the structure of the problem until a solution is found. And current models cannot do this.

Host:This just transitions to some research directions "beyond pure pre-training and reinforcement learning scaling." Many of the problems you just talked about are actually very close to "continuous learning"—this is also a topic that has been increasingly discussed publicly recently. I am curious, from a macro perspective, how would you explain to the audience: to make continuous learning truly feasible, what are the core, most needed problems to be solved?

Jerry:The most core point is: if you want the model to be continuously trained, you must ensure that the model does not collapse and does not enter some strange, out-of-control state. There are many ways for deep learning training to fail, and now a large part of the work in large laboratories is actually to keep the model "on track" and keep the training process healthy. Fundamentally, this is a very fragile thing—training is not a process that naturally proceeds smoothly; you must continuously invest a lot of effort for the training not to "explode." If you don't do this, it is ultimately difficult to get a good model.

And in my opinion, this is fundamentally different from the way humans learn. The human learning process is much more anti-fragile and more robust. Humans can continuously self-repair and adjust during the learning process, rather than easily collapsing. When I was doing reinforcement learning research, I was often surprised: how rarely humans suddenly "crash" after learning new information, start talking nonsense, or fall into some strange cognitive state; while AI models are quite prone to this. This is exactly the problem researchers have been trying to solve—from both theoretical and practical perspectives: how to combat this instability. I think this fundamental robustness of the training process itself is likely the key prerequisite for achieving continuous learning.

Host:In your view, how many interesting ideas about continuous learning have actually existed for a while and have been repeatedly discussed? And how many are truly new research problems?

Jerry:I think, as a researcher, the most important and most worthwhile question to ask yourself repeatedly is: why hasn't this problem been solved yet? Continuous learning has obviously not been truly solved, so the question is: why? There are so many smart researchers in the world, so many excellent ideas, but no one has truly "broken through" continuous learning so far. There must be a reason behind this.

There are many different assumptions about this issue. But one I think is very fundamental is: this is likely a problem that must be solved at the "scale" level, at least reaching a certain threshold scale. And now, the number of top laboratories that truly have the conditions to do this kind of research is very limited; the number of research projects they can advance simultaneously is also limited. So it is likely not that there is no correct direction, but that if this problem could be thoroughly verified and fundamentally broken through on a small scale, it might have already been done by someone. Then it is either an extremely complex, theoretically very difficult problem; or it requires already very large models and computing resources, which are only in the hands of a very few laboratories. And these few laboratories may just not have had time yet, or have not chosen to explore a specific path—because they were busy with other things at the time.

Host:I used to hear you say this: in AI research, some ideas "have not reached the right time," but they are still good ideas. Reinforcement learning itself is an example—it only truly exploded after having large-scale pre-training models as a foundation. So it sounds like your intuition is that there are actually some very good ideas now, but if they can be tried on a large enough scale, they might be of great help to this kind of problem.

Jerry:Yes, I completely agree.

Research directions in major laboratories are highly convergent

Host:You also mentioned a phenomenon: research directions in major laboratories are highly convergent, and what everyone is doing is becoming more and more similar. I don't know if this is also your true feeling over the past two or three years, but when you were leading certain work back then, those were indeed new directions, and many laboratories were caught off guard at the time. Can you talk about this "convergence" trend over the past year or so? Is this unexpected for you?

Jerry:In reinforcement learning, there is a very classic and well-studied trade-off: exploration vs. exploitation. When should you try new things? When should you optimize the things you are already good at? This trade-off itself has no standard answer, because you never know whether "unknown things" are worth exploring.

Fundamentally, the question is: is there a path completely different from the current one that can bring huge benefits? But if you don't understand the terrain of the entire search space, this question itself is extremely difficult.

I remember someone once told me: why do all commercial aircraft look roughly the same? Although there are more than one company manufacturing them. The reason is that under economic constraints, this is basically the most efficient design.

Today's behavior of major laboratories is actually driven by very strong economic forces. If you want to participate in competition, you must make the best possible model at the lowest possible cost. Under this goal, existing technology combinations are already quite efficient. Customers can switch models at any time, and ultimately, users benefit—this further pushes laboratories to continuously optimize efficiency on the same path. Of course, there is always the problem of exploration and exploitation here. Should we "sail out to sea" to see if there is a completely different continent in the distance? Should we train a completely different model?

Doing so may distract you, making you unable to continuously improve existing methods and make them more efficient. But on the other hand, maybe there is a 10-fold or even 100-fold breakthrough there. Ultimately, this depends on a belief and judgment: how much risk are we willing to take for exploring the unknown?

Host:As you said, there is indeed a very clear route now: continuously adding data to reinforcement learning and various tasks, continuously improving economically valuable capabilities. Each laboratory has a clear roadmap, which makes "betting everything on a completely new direction" even more difficult. And when pre-training seemed to approach a bottleneck back then, it was easier to encourage exploration.

Jerry:Yes, different historical stages are indeed different. Some periods have more space for exploration and higher fault tolerance; while when competition becomes extremely intense, it becomes very similar to a "prisoner's dilemma"—as long as you choose to be different, you can easily lose advantage in the market competition.

The first-mover advantage of laboratories is important

Host:Do you think laboratories must become the discoverer of the "next major breakthrough"? I ask this because these ideas often spread very quickly. For example, your pioneering work on reasoning models, now several laboratories have strong reasoning models. I even think: even if the breakthrough happens elsewhere, laboratories can completely accept it? Because these ideas will eventually be absorbed into the existing commercial system.

Jerry:The diffusion of ideas is certainly a good thing, but the value of "being one step ahead" should not be underestimated. We have seen such examples: many people once thought OpenAI could not succeed, but it took the lead in large-scale Transformer pre-training and eventually became one of the most successful companies in the world. Similarly, OpenAI took the lead in solving the problem of large-scale reinforcement learning, which enabled it to have one of the strongest reinforcement learning research systems in the industry until today, and to do more ambitious things.

Even if ideas spread, the first-mover advantage is still extremely powerful—if you can maintain this advantage, it may even exist for a long time. I recently read a book about semiconductor manufacturing. Many of the earliest key technological inventions occurred in the United States and gradually spread around the world. But at the same time, there are some phased leading advantages that other countries can never catch up with—the compound effect brought by early bets and continuous investment will play a role for a long time.

It is not that only one country can do semiconductors, but it is by no means that every country can. In every industrial change, there will always be new winners and new losers; some old companies successfully transform, and some are eliminated—this is the Darwinian process in progress.

Host:Consumers and companies often remember the first company that brings a "magical experience." You have obviously experienced this with ChatGPT. You have made so many progress in reinforcement learning, and this direction is still advancing continuously, but you finally chose to leave OpenAI to explore new research areas. I am curious: when did you start to realize that you might want to leave? And how did you really make this decision?

Jerry:This was not a sudden decision, but a process that slowly grew in my heart. OpenAI is not an easy place to leave—I have many friends there, a lot of shared history, and a large part of my life was built there. I once tried very hard to keep everything running and look for different possibilities.

But as a researcher, if one day you wake up and find that you no longer truly love what you are doing, no longer feel extremely excited about it, then maybe it is time to try something else. If you don't have 100% passion, it is almost impossible to do the best research work. I had many days of infinite passion at OpenAI, but later, this feeling became harder and harder to maintain.

Host:What is giving you energy now?

Jerry:From the most fundamental level, I joined OpenAI because I believed reinforcement learning is a necessary component to AGI, and I really, really wanted it to happen. Introducing "reasoning models" to the world was a paradigm-level shift for me. To some extent, I want to chase that feeling again—to find the missing piece in the current model training methods and make it mainstream. But once you have done such a thing, it is hard to get the same intensity of "shock." So what I want now is some freedom to think, explore, and try to solve the most core and important problems.

Host:Are you now advancing with many specific hypotheses, or are you more "pulling back the perspective" to re-observe the entire field?

Jerry:Generally speaking, truly important problems are not suddenly discovered after seven years of machine learning. You actually know which problems are most critical long ago. The real difficulty is: how to solve them in a way different from everyone else. Because if they could be solved in conventional ways, they would have already been successfully solved by someone.

OpenAI's two key decisions:

Concentrating resources to train GPT-4, betting on "reasoning models are the future"

Host:You once said that since joining OpenAI in 2019, almost every year has been like a "different company." I would like you to review these six or seven years of evolution and talk about the growth narrative of OpenAI in your eyes.

Jerry:Starting from a small laboratory with only thirty or forty people, and being completely open from the very beginning, this was an extremely bold choice. We really believed at the time that this would be a place to build AGI and bring universal benefits of digital intelligence to the world.

From initially doing some "seemingly cool but extremely ambitious" projects with just a few people, to developing into one of the world's largest companies today, making products that almost everyone knows and uses every day, and even making it hard to imagine life without it—this experience is truly crazy. You also know that OpenAI's management and organizational structure have undergone considerable changes over the past year. The people you work with every day have changed, the company size has changed, and research topics are constantly changing. In the early days, there was no concept of "pre-training" at all; later, for a period, almost everything revolved around pre-training; then, it became a bit like the "old OpenAI." Now it is more balanced, with both pre-training and other directions. Many people leave OpenAI to start companies and start new life stages; at the same time, a large number of excellent new blood join and continue to do very good research inside. This is a company that constantly reinvents itself and successfully grows at every stage. I often think that the stories of those great successful companies must be wonderful, and being able to experience these stages personally must be an irreplaceable experience. I feel that I participated in a fairly early period of OpenAI, and this experience is really difficult to compare with anything else.

Host:Everyone is looking forward to someone systematically writing down this period of history of OpenAI in the future. Usually, such stories focus on those "key but extremely difficult decisions"—those bifurcation points that could have developed in different directions. For you, are there any particularly key decisions that left a deep impression on you?

Jerry:This is a good question. I actually only truly participated in part of it, and many decisions I may have just been a "background character." For example, the discussion about whether to release ChatGPT to the world—you may have also heard that its later popularity and viral spread were at least not expected by anyone inside. With the release of ChatGPT and subsequently GPT-4, we created a "moment" and formed a kind of momentum that was very difficult to predict, which shaped today's OpenAI in many dimensions. For example, concentrating a large amount of resources to train GPT-4 at that time point was also a decision accompanied by huge trade-offs, but it was extremely critical in OpenAI's history, and facts have proven it to be an excellent choice. There was also a very important gamble: betting on "reasoning models are the future." At that time, there was no certainty at all, and it was more based on first-principles thinking and intuition. We decided to let OpenAI completely turn in this direction, even though there was no product-market fit at the time. The earliest reasoning models looked smart, but were almost only suitable for doing puzzles and not very helpful for real-world use. Until later, with more investment and the addition of tool-use capabilities, they began to become extremely useful in research and programming. Once a real PMF appeared, humans are very good at optimizing an "already feasible thing." But reaching that step itself was a very difficult and very worthwhile research journey. OpenAI really passed the exam at that stage.

Host:The process you described of "continuously increasing investment in uncertainty" is very interesting and is also highly related to your current judgment on reasoning models. When did you truly realize that this is not just fun, but can be scaled and go far?

Jerry:To be honest, I believed in it from the beginning. This mainly stems from my belief in reinforcement learning. From the day I joined OpenAI, I firmly believed: if we are to move towards AGI, reinforcement learning is an essential component. The question has never been "whether to do it," but "when to be ready and how to do it." As time and research progress, we continuously get experimental results, further verifying that this path is correct.

Why Anthropic is leading in programming: Focus

The most important skill at present is "managing junior engineers"

Host:A very unique aspect of OpenAI is that it is both a research laboratory pursuing AGI and has "accidentally" made a consumer product that swept the globe. The company simultaneously does consumer products, enterprise products, and core research. How does this operate internally? Will researchers be pulled in too many different directions?

Jerry:One thing is actually very clear: OpenAI's research department has been highly separated from the product team from the very beginning. The company's core mission has always been "to build intelligence." There is indeed a research team dedicated to products, responsible for optimizing models around specific product metrics; but the focus of most research is always: how to make models smarter. At least within the research department, this "pulling feeling" is not strong. What is truly complex is: OpenAI is at the center of the biggest technological transformation our generation may experience. Opportunities are too many, and almost every industry will be reshaped by AI. If you do nothing, it seems like a waste. But this also brings a very real and very dangerous problem: focus. A company can usually only do one extremely difficult thing to the extreme, and it is rare for an organization to do multiple extremely difficult things to the top level at the same time. This is a huge risk for OpenAI. For example, OpenAI once lost focus on the "code" direction for a period, putting more attention on consumer products, and indeed paid the price in market share. Now they are working very hard to catch up, and the recent coding models have become very strong again, but this distraction has a cost. AI companies are now like walking into a candy store, with potentially extremely valuable things everywhere, and it is difficult to restrain themselves from doing more. But every direction has competition, the question is only: who can do which thing truly correctly.

Host:This also leads to the issue of the ecosystem. You mentioned the coding field, why is Anthropic so outstanding in code?

Jerry:In one sentence: focus. I know Anthropic's founders, and they have been like this since they were at OpenAI. They have always attached great importance to programming and firmly believe that this is a key component to AGI. I can imagine how focused their efforts have been in this direction over the years. The latest Claude coding models and agents have indeed pushed this vision very far. They say "few people in the company write code themselves anymore," and I believe this is not an exaggeration.

Host:Does this mean that future large model laboratories will naturally move towards division of labor, each focusing on different capabilities?

Jerry:This depends on which kind of world we ultimately live in. If data is the core driving force, this is a zero-sum game: you put data into a certain skill, the model becomes stronger in that skill, so the market naturally splits and specializes. If research is the key, then research has a "magical attribute": a successful breakthrough may allow the model to leap forward in all fields simultaneously, directly taking an overall lead. We still cannot determine which future will prevail. But I am very sure: there must be a more general path, just not knowing how difficult it is to find. Even there is a slightly pessimistic but not impossible situation: maybe we have reached the last model that humans can design by hand, and next, models will research better models themselves. Current coding agents are already powerful enough, plus huge computing power, this inference is not absurd. Of course, I still hope that humans can continue to personally complete some key things. From an essence, the history of programming is to continuously improve the level of abstraction. Coding agents can be regarded as a brand new, higher-level "programming language." I think the future is likely no longer humans directly typing code, but software must still be reliable. The problem we need to solve is: when we neither write nor even read code, how do we ensure the system does the right thing? I believe these problems can be solved. The most important skill at present is actually the ability to "manage junior engineers." The best managers both deeply understand the system and can let others make decisions—this is precisely the best way to collaborate with models. Not being together with the research team is indeed a disadvantage for application companies. Ultimately, successful AI companies will often start training their own models themselves. Application companies may start from products, gradually move towards post-training and retraining, and eventually even build their own data centers—this is a natural growth path. This does not mean that small companies have no opportunities. If data is important, you can differentiate with data; if research is important, small companies may also innovate under constraints. Focusing on a certain field, seeing the shortcomings of the model, may instead make a model that is extremely strong in that field and even better in a broader sense, and thus grow into the next giant.

Host:But the reality is that the common problem in the past is: you may have just taken the lead a little, even only "one second," and the next generation of models is released, and you suddenly find yourself far behind again.

Jerry:Competition is indeed very cruel. We have seen many times in the US tech industry that large companies have a lot of structural advantages, this is true. But at the same time, new and very successful large companies have been emerging continuously. So this is not without hope, but it is very difficult.

Capabilities excellent AI researchers should possess:

System engineering, theory, independent thinking, and anti-conformity

Host:I want to turn the topic to the talent ecosystem and research itself. You are both a very outstanding researcher and have worked with many top researchers. The competition for researcher recruitment is now extremely fierce, and you also participated in a lot of recruitment work at OpenAI back then. So today, what determines which company researchers choose to go to?

Jerry:This is a good question. In the final analysis, people are very complex—even more complex than models now. Everyone's incentive mechanisms and what they want are different, so I actually cannot generalize.

I think that recruiters should not only ask "how can I attract the most people," or "how to make myself look most attractive to all researchers." This is of course a problem, but there is a more important question: what kind of researcher will really want to work here? Because trying to please everyone is almost impossible. Different people have different preferences, different values, and different ways of working. Instead of that, it is better to deliberately build a team that is highly aligned in values and methodologies. Facts have repeatedly proven that teams with consistent goals and alignment act faster and have better results. So this is essentially a two-way screening process, finding the "right person" and the "right team," which will make individuals happier and the team more successful, and also make this team more and more attractive over time.

Host:But we have also seen some very interesting experiments, such as Meta once used extremely exaggerated compensation packages to grab people. How do you view this approach?

Jerry:Different companies have different strategies for building research teams. At a certain stage, Meta was clearly on the unfavorable side of the supply-demand curve, and after experiencing some setbacks, they needed to use very attractive conditions to attract people back. "Momentum" and "momentum" are very important in the talent market and are also very difficult to reverse. Once an impression of "you are not doing well" is formed in the industry, it will lead to you not being able to recruit people, which will further reinforce this impression. So from this perspective, this is a reasonable and even smart strategy to break the negative feedback loop. In the context of AI being crucial to large technology companies, Meta has indeed rebuilt a new team and is training new models. The entire industry is watching whether this attempt will succeed and how it will determine the future of this laboratory. But anyway, this step has indeed injected new vitality into Meta AI.

Host:You have done a lot of pioneering AI research and have worked with many top researchers. In your view, what kind of person is an excellent AI researcher?

Jerry:This is a difficult question to answer. To some extent, success is indeed related to "being in the right place at the right time." But if we talk about basic skills, I think an excellent AI researcher at present must be very solid in both the system and engineering level and the theoretical level. You need to understand how computer systems work, how neural networks are trained, and also understand the theoretical foundations of neural networks and optimization. Being good at only one side is almost impossible to be top-level; and if both sides are at least "good enough," your research efficiency will directly increase by an order of magnitude. Another extremely important but often overlooked ability is independent thinking and anti-conformity. Humans have a natural tendency to gradually converge to the median view of the group, which almost stifles real research. I often say that if you have 100 researchers thinking about the same thing, you essentially only have 1 researcher. The essence of research is to do "things that don't work yet," and these things are exactly what most people don't believe in for the time being. To do this, you need a very scarce quality—courage. Dare to stand up and say: "Let's try a different path." In an era when experiments are extremely expensive, this is especially difficult. Many machine learning experiments are already close to Hollywood movies. Like making movies, you can reduce risks through stars and special effects as much as possible, but in the end, experiments are experiments, and the results are always uncertain. So to summarize: deeply understanding systems and theory, not following the crowd excessively, and having the courage to stick to your own judgment—these are the core traits of excellent AI researchers in my mind.

Static models can never become AGI

It will take two to three years for robots to have a "turning point moment"

Host:We usually end the interview with a quick Q&A. First question: over the past year, what is an important view you have changed on AI?

Jerry:My recent major cognitive update is: static models can never become AGI. Continuous learning is essential.

Host:Is this because static models cannot do it in terms of capability, or because from a definition, it does not meet the requirements of AGI?

Jerry:More because we have gradually seen what the model is still missing. They are already very strong in many aspects, but if they cannot continue to learn, in my opinion, they will always be a tool that requires human supervision, not a true intelligent agent.

Host:In addition to the fields we discussed today, AI is also making rapid progress in other directions. How long do you think it will take for the robotics field to have a "turning point moment" similar to ChatGPT?

Jerry:I guess it will take about two to three years.

Host:This judgment is quite aggressive. Many people are still skeptical about whether there are scaling laws in the robotics field and whether there is enough data.

Jerry:To be honest, I think the reality is better than most people imagine. Many companies are already making substantial progress, but these results need time to ferment and further investment. I am quite optimistic about the development of robotics in the next few years.

Host:What about the biological field?

Jerry:Biological will be much slower.

Host:Why is it slower than robotics?

Jerry:From the perspective of required intelligence level and operational precision, biology is much more complex. This is a field that requires more fundamental investment to truly start taking off.

Host:In the context of continuous model progress, what impact do you think society may underestimate or not discuss enough?

Jerry:Large-scale job automation is almost inevitable in the coming decades. On the one hand, we are indeed talking about this issue; but on the other hand, I think we are not talking about it seriously enough. The world will be very different from today, which is almost certain for me. Social change itself is slow, but this transformation will be very strange and may be very painful in some aspects. We need to think ahead about how to make this process as painless as possible, because the future form of employment will definitely be very different from today.

Reference link:

https://www.youtube.com/watch?v=XtPZGVpbzOE

——Recommended Articles——

Kimi's new work: K2.5 open source release, new king of visual understanding! Visual, programming, and agents all top the open source SOTA, create 100 sub-agents in minutes, replicate a website with just one video

Anthropic makes a strong move, Clawdbot renamed Moltbot! Creator reveals the product's birth story; code itself is not valuable, you can create a "one-person company" without programming, many APPs will naturally disappear

Karpathy: Can't go back, hurt by AI! Reveals the degradation of his own hand-coding ability, shares his experience collaborating with CC! Musk: A summary of the spirit of the times! CC's father: There will be no garbage code doomsday

Former OpenAI Researcher: AGI Requires Models to Break Through Difficulties on Their Own; The Biggest Problem is Generalization; The Most Important Skill is "Managing Junior Engineers"; Robots Will Have a "ChatGPT Moment" in Two to Three Years

Editor | Listening to the Rain

Related Articles

分享網址