Meta Bets on Neural Computers: Is the Next-Gen Computer the Model Itself?

We are beginning to expect machines themselves to learn how to run.

If you have ever thought, "AI will ultimately become a form of computer," then this article is written for you.

Over the past few decades, computers have gradually become the primary medium for humans to accomplish tasks. In recent years, AI has begun to occupy this position as well: it no longer just answers questions but also starts calling tools, manipulating interfaces, and participating in real workflows.

The question has thus changed: Do we expect AI to use the computer, or to become a computer?

The concept of the Neural Computer (NC) discusses precisely this issue: Can the model assume part of the responsibilities that originally belonged to the machine's own operation?

The "Neural Computer" discussed here does not entirely refer to the NTM/DNC route pioneered by Alex Graves [1][2], nor is it discussing specific new hardware (such as the recent Taalas) or a particular application.

What this article aims to discuss more is whether a learning machine will transition from "using a computer" to "becoming a computer."

Therefore, the following are not the goals of the Neural Computer, such as: stronger agents, world models within computer environments, or simply adding another layer of intelligence on top of traditional computers.

It concerns whether the system responsibilities originally borne by the program stack, toolchains, and control layers will gradually enter the Runtime that the model actually relies on.

This notion, I imagine, has floated through many minds; I will call it the "pre-consensus" for now.

The general viewpoint is as follows:

Neural Computer (NC) discusses whether models will begin to assume part of the operational responsibilities that originally belonged to the machine itself.
Traditional computers revolve around explicit programs, Agents around tasks, and World Models around environments, whereas NC revolves around the Runtime.
Completely Neural Computer (CNC) is the complete state of NC.
Current prototypes are already showing the rudiments of early Runtime primitives.
If capabilities begin to enter the Runtime and can be installed, reused, and governed there, then the Neural Computer may redefine the word "computer."

Paper Address:
https://arxiv.org/abs/2604.06425

GitHub Address:
https://github.com/metauto-ai/NeuralComputer

Why Now: A "New Machine Morphology" is Emerging

Today, three things are happening simultaneously.

First, agents are becoming increasingly capable at getting things done.

From MetaGPT in 2023 (one of the "ancient" Coding Agents) [3], which could barely write a few hundred lines of code, to 2025 when Cursor, Codex, and Claude Code became standard productivity tools for programmers, and now to OpenClaw [4] truly entering the public eye.

What people care about is no longer whether an agent can occasionally accomplish a task, but whether it can enter real production and daily life to stably handle various affairs.

For agents, the bottlenecks of concern now are: 1) How to maintain stability over long-horizon tasks; 2) How capabilities can be solidified; 3) How processes can be continuously reused.

The current solutions mainly involve adding more to the agent's scaffold (or harness) side: using stronger memory, longer workflows, and more stable action loops to maximize task completion rates as much as possible.

Looking further ahead, a more radical direction is recursive self-improvement: models training the next generation of models, and agents continuously rewriting themselves [5].

〓 Agents are moving from prototype experiments to professional productivity tools, and then to mass daily infrastructure. [3][4]

Second, world models are becoming increasingly adept at modeling dynamic environments.

In the past year, with world model experiments from projects like GameNGen and Genie 2/3, more and more people are starting to believe: models can not only represent the current state but also maintain a dynamic structure internally regarding "what will happen next."

It originally simulated environmental evolution; what is noteworthy now is that this capability has entered some real closed loops.

This is especially evident in those corner cases in reality that are difficult and costly to collect repeatedly. In these scenarios, rollout is being directly used for prediction, planning, control, and training.

Looking along this line, from Jürgen Schmidhuber's Making the World Differentiable [6] proposed in 1990, to the 2018 "World Models" [7], and now to Waymo incorporating world models into autonomous driving simulation and training [8][9], this route has begun to enter specific system 环节 such as autonomous driving simulation, training, and interactive environment generation.

This also makes the world model no longer just "representing the world," but starting to move towards "unfolding the world" and "intervening in the world."

For the world model, it is better at first generating several possible future states, and then using these rollouts to perform planning, filtering, and action loops.

Today, this route has diverged into several obvious directions: in autonomous driving and physical AI, it mainly plays the role of a simulation and synthetic data engine, used to supplement data that is expensive, dangerous, or scarce in the real world.

Examples include Waymo World Model and NVIDIA Cosmos [8][10]; in spatial intelligence, it pursues a 3D world that is generatable, enterable, and sustainably interactive, such as World Labs' Marble [11].

In directions leaning more towards real-time interactive worlds, generative models have moved from static content generation to controllable, interactive, and explorable environment generation. Representative examples include GameNGen's real-time neural simulation of DOOM [12], and Google DeepMind's Genie 2 / Genie 3 [13][14].

Although these directions have differentiated, they are essentially still solving the same type of problem: how to learn the laws of environmental evolution over time, actions, and constraints into the system's interior.

〓 From 1990 to 2018 and now: World models have evolved from early differentiable world modeling concepts to autonomous driving simulation and training represented by Waymo World Model [6][7][8][9].

Third, the structural friction of traditional computers in the AI era is becoming increasingly obvious.

Today, more and more tasks are no longer about deterministic solving but open-ended requirements; no longer one-time input-output but long-term interaction; no longer clear programs but processes of doing things with vague goals requiring continuous adjustment.

Precisely because of this, traditional software stacks are starting to appear cumbersome. While traditional software stacks certainly have stability advantages, in many scenarios dominated by natural language, demonstrations, interface operations, and weak constraints, the cost of organizing and driving these tasks has become increasingly high.

Traditional computers themselves are also rewriting their foundations for AI. Chips, compilers, memory systems, and software stacks are all becoming more model-friendly.

However, most of these changes still occur within existing computing paradigms: they make old machines more suitable for AI but do not rewrite "what a machine is."

Amidst these changes, routes like Taalas have pushed things a step further, starting to make specific models a deployment unit: models are no longer just workloads running on machines but are approaching the line of "organizing hardware by model" [15].

But at least today, this is still just a change at the deployment layer and cannot yet be called a universal machine morphology.

These three changes actually point to the same question.

If agents are getting better at doing things, world models are getting better at deduction, and traditional computers are rewriting their foundations for AI, then will there emerge a new Runtime that incorporates execution, rollout, and capability solidification into the same learning machine?

From the perspective of the relationship between humans and machines, this corresponds to a shift in the primary relationship: in traditional computing, humans mainly interact with the computer; in the agent era, humans interact more with the agent, which then calls the computer to get things done.

The world model here is closer to a parallel prediction layer: it can serve both humans and agents but is not itself responsible for getting things done.

Pushing further, what NC wants to change is the machine itself: it attempts to converge the responsibilities currently scattered among computers, agents, and world models into the interior of the same learning machine.

At that time, what humans face will no longer be just "agents calling computers on their behalf," but the direct use of such a Neural Computer.

〓 How the human-machine relationship changes: In the past, it was more like Human → Computer; in the agent era, the relationship is more like Human → Agent → Computer, with World Models appearing more as a parallel prediction layer; if NC holds, humans will face a Neural Computer more directly.

This also means that interaction itself will begin to carry the connotation of "programming."

Today, natural language instructions, mouse trajectories, screen changes, and task feedback are mostly just process logs; in the NC setting, they will become the material that shapes future behavior.

Today, we mainly install capabilities through code; in the future, demonstrations, interaction trajectories, and constraints themselves may also become the entry points for capabilities entering the Runtime.

What is a Neural Computer, and What Counts as It Truly Holding?

First, look at this table: it places traditional computers, Agents, World Models, and Neural Computers on the same "ruler" for comparison.

After viewing this table, the differences and connections become clear: what they each organize around, where the source of truth lies, and what responsibilities they respectively undertake.

Next, we can directly envision: if NC already existed, how would people use it?

For traditional computers, you install software; for agents, you describe tasks; for NC, what you do is closer to installing capabilities into the machine, expecting these capabilities to remain in the machine thereafter.

Precisely because of this, the Runtime mentioned here is not a certain software component but the layer by which the system continuously remains the same machine: what stays, what pushes the state forward, what input truly changes the machine, and what changes equate to rewriting the machine once.

For NC, the key is not adding another layer of external tools but whether capabilities and states can truly enter the same learned runtime.

If It Holds, What Will the Machine Look Like?

First, it may not continue to develop along today's foundation model route.

The more natural idea today is to push models towards larger and stronger 1B - 10T dense/MoE foundation models; much work is indeed proceeding along this path.

But in my view, once NC truly matures, the foundation is more likely to go in another direction: 10T - 1000T level, sparser, more addressable, with a bit of a circuit flavor.

The future CNC might not be a continuously larger lump of continuous representation but rather resemble a set of routable, composable, and locally easier-to-inspect machine foundations.

It doesn't necessarily have to mimic animal perception or the human brain; instead, it might be closer to a neural network with a NAND flavor: discrete, sparse, and locally verifiable.

At least for now, this path has not been systematically unfolded.

Some recent work by OpenAI on weight-sparse transformers can only be considered one signal; more importantly, behind this lies an older and richer line of thought in AI, especially in reinforcement learning, where sparse structures, local division of labor, and routing mechanisms have always been directly related to how systems learn and act [16].

Second, it also may not always rely on overall parameter modification to upgrade itself.

What NC points to is another mode of evolution: relying on the Runtime's self-programming and continuous interaction, allowing the machine to continuously self-evolve along its internal capability structure.

User input will no longer just trigger one-off behaviors but will gradually install, call, combine, and retain reusable neural routines, even forming internal executors that can be called upon in the future.

At least in terms of functional division, it is closer to "memory" in traditional computers rather than the processor: upgrading doesn't necessarily mean rewriting the ontology of the entire machine but might just mean stably writing these new structures into a layer of addressable, callable, and retainable internal state.

Following this path, upgrading is no longer just "swapping for a larger model" but more like continuously installing new components inside the machine.

NPI and HyperNetworks from several years ago can also be seen as similar but incomplete early ideas: the former attempted to break complex programs into callable, composable subroutines [17]; the latter hinted that machines might even continue to generate downstream neural modules to expand their capability boundaries [18].

Of course, I believe the ambition can be a bit greater: a sufficiently strong Neural Computer could completely directly generate new (sub-)NNs and hang them inside itself in a pluggable way, as natural as installing or uninstalling software today, only this time skipping the intermediary of handwritten code and compilation.

Third, it might also gradually incorporate world model-style rollouts into the Runtime.

By then, rollout will slowly become a daily mechanism of the machine and also become part of this self-programming and self-evolution.

Humans can provide input and expected output (GT), or just write evaluation metrics in advance; or even in a certain round give nothing at all, and the Runtime can continuously self-play, self-test, filter, and compress candidate approaches internally, then solidify effective improvements into the next round of capability updates.

In an ideal state, while humans sleep, the machine completes evaluation, trial-and-error, and iteration internally. What truly remains is not just more context, but the internal capability structure itself has changed.

Of course, the premise of all this is not letting the system change secretly, but that the update path itself is governable.

Viewed this way, the outline of NC as a machine morphology becomes relatively clear. The key lies in whether capabilities have truly entered the Runtime and are installed, reused, executed, and governed there.

What CNC refers to is what it looks like after this is accomplished (the complete state).

According to the original paper's definition, an NC instance can only be counted as CNC when it simultaneously satisfies four conditions: it must be Turing complete, universally programmable, remain behavior-consistent unless explicitly reprogrammed, and embody the architecture and programming semantics of NC relative to traditional computers.

The table below is a more straightforward summary of these four requirements from the original paper.

The Prototype Implemented in the Paper: What It Proves and What Is Lacking

By my judgment, the Neural Computer will truly take shape in about three years. Therefore, compared to the Neural Computer I truly envision, the work in our paper is still just a very early step.

Placed in today's context, I believe the most handy unified carrier is still this type of neural network oriented towards video generation and world models; we must first put pixels, actions, and time rollouts into the same end-to-end prototype, as they are the fastest route.

What we are verifying with them now is only part of the key capabilities of NC. They are more like transitional implementation references rather than the final structure of NC; if we truly want to reach CNC, a more thorough, bottom-up reconstruction will still be needed in the end.

3.1 CLIGen (General): A "Computer Imitation Game" That Passes for Real

First, let's see if terminal rendering can hold up: color schemes, cursors, scrolling, TUI, and overall sense of rhythm.

Let's look at the results generated by the first set of experiments. If you don't look closely, they already look somewhat deceptively real. For CLIGen (General), what can be seen first here is that video models can already make terminal rendering look sufficiently real.

Mainstream video models were originally not trained for such text-dense, discrete-layout-dependent computer scenarios; but after further training, the "Imitation Game for Computers" can indeed be produced.

This set first learned the outermost things of the terminal: how colors change, how the cursor flashes, whether the window aspect ratio is stable, how long logs scroll, and how full-screen TUI, progress bars, and status bars appear.

What first held up was this surface layer and rhythm of the terminal. Borrowing the previous phrasing, what was first learned here is still the appearance of the Runtime.

Looking back at September 2025, this experimental result is surprising.

Using only about 1,100 hours of noisy terminal datasets, Wan2.1 [31], which originally hardly understood computer interfaces and struggled to generate even slightly small text, was brought to a level where it could stably generate terminal representations, achieving considerable shallow alignment with common commands, echoes, and log morphologies.

For video generation, this type of text-dense, fast-changing, flickering, and almost naturally dynamic scene is inherently one of the most difficult categories; but this result indeed exceeded the expectations of many at the time.

What was used here is still general video in the terminal domain, with many styles and mixed scenes. With terminal rendering holding up first, it encourages us to try the harder things in computers: memory, reasoning, programming, and execution.

3.2 REPL and Math: It No Longer Just "Draws Terminals"

Here we focus on harder execution structures: input, enter, echo, local editing, and state continuation.

After the preliminary experiments of terminal rendering, a more interesting question arises: Can the terminal be treated as a partial machine that can be stably driven by actions for testing?

Type a command, does the buffer move forward? Press enter once, does the echo follow? After typing errors, deletions, and re-typing, can the state continue? REPL and Math here are actually two sides of the same coin: has the model started to learn some of the state transition laws within the terminal?

Now, the focus shifts to the causal structure of instruction execution. The training data for this set comes from cleaner, more repeatable script trajectories: we generated these terminal videos ourselves through scripts and Docker environments, ensuring that input, enter, echo, errors, and local editing all fall within a more stable terminal environment.

From this set of results, it can already be seen that the model has learned some of the most basic operating laws of computer terminals.

For very simple instructions like pwd, date, whoami, echo $HOME, env | head -n 5, the input, enter, echo, and result display can already be made quite close to reality; the output morphology for different commands also matches the corresponding terminal scenarios.

Compared to the experiments in the previous section, the instructions themselves can already drive character updates, echo generation, and local state changes, and the terminal unfolds according to its own mode of operation.

Continuing along this line, the model has actually touched upon some things in simple mathematical scenarios, but the reasoning ability itself has not yet been truly solved.

At the level of two-digit addition, the most basic arithmetic, current models still struggle to calculate correctly stably.

Of course, there is the issue of data volume here: we haven't given the model enough, hard enough training data to force out stable reasoning; but there is also another more fundamental possibility: using current DiT-based video models to carry stable reasoning might itself be a false proposition.

The more prudent judgment for now is that the layer of terminal execution has started to stand, but the layer of symbolic reasoning has not yet passed.

3.3 GUIWorld: Interface Manipulation Also Starts to Hold

Finally, let's see if actions can truly drive interface states: can clicks, hovers, inputs, and window feedback close the loop?

In the CLI phase, we roughly saw clearly: video models have strong rendering capabilities, basic memory and execution capabilities are starting to appear, but the underlying symbolic reasoning is not yet good enough.

In GUIWorld, the focus becomes: will the interface state be pushed forward by actions.

GUIWorld directly pushes the problem from CLI to GUI.

Getting here, the problem is no longer mainly about text and commands but real keyboard and mouse actions: the mouse must land on points, hovers must produce feedback, and after clicking, buttons, dropdowns, modal windows, and input boxes must truly change state; keyboard input must also drive the interface forward frame by frame.

The corresponding data is already a quite complete interaction rig: we first fixed a 1024×768, 15 FPS environment in the Ubuntu 22.04 XFCE4 desktop, then set up the entire desktop operation, recording, and action playback process, so that every click, hover, input, and interface change could be stably recorded.

The data is divided into three parts: about 1000 hours of Random Slow, about 400 hours of Random Fast, and about 110 hours of real interaction goal-directed trajectories driven by Claude CUA.

The former two test how open-world noise like mouse acceleration, pauses, hovers, and window switching affects the model; the latter provides clearer action-response pairs to see if the model has learned: after doing this action, can the interface trigger the appropriate change accordingly?

On the model side, we didn't just try one action injection method but ran four versions in parallel. Their core difference lies not in "whether action was added" but in how deep within the layers the action enters the backbone to participate in state evolution.

Figure 7 in the paper draws these four methods very clearly:

〓 Figure 7: Four ways to inject GUI actions into the diffusion transformer. These correspond to Model 1 through Model 4 mentioned above.

From the final experimental results (omitting details here for now): among the four model designs, Model 4 achieved the best comprehensive results.

This set of results indicates that for fine-grained, strong time-series, and strong local interaction environments like GUI, injecting actions directly inside the block makes it easiest for the model to learn "how the interface continues after an action" into the backbone.

At the same time, 110 hours of supervised data performed significantly better than about 1400 hours of random data; explicit cursor visual supervision was also much stronger than simple coordinate supervision.

Taken together, the most straightforward conclusion of GUIWorld is: what the GUI line lacks most is harder action semantics, clearer state transitions, and treating the cursor as a visual object for supervision.

Although few initially favored video models handling such highly discrete, text-dense, and action-sensitive computer scenarios, as long as task design and data organization are appropriate, it can already yield many interesting results in interface rendering, page switching, short-term state continuation, local interaction, execution echo, and even some very preliminary working memory.

In other words, video models may still be far from the endgame, but as containers for early prototypes, they are already sufficient to put many previously abstract Neural Computer issues on the table.

3.4 From Prototype NC to CNC, What Is Still Missing

Bringing back the CNC condition table from Section 2, the general conclusion of the current prototype is relatively clear: Turing complete has only touched the edge, universally programmable is just showing an entrance, behavior-consistent only holds locally in controlled environments, and for machine-native semantics, the direction is clearer than the conclusion.

What NC needs to solve is not simply superimposing agents, world models, and traditional computers, but gradually pulling back part of the responsibilities currently scattered among these objects into the same learned runtime.

The truly important place for the current prototype is not that it is already approaching the endgame, but that it exposes several hard thresholds determining whether CNC can hold in advance.

If Neural Computer Holds, Software, Hardware, and "Programs" Will All Change

If we clarify the relationship a bit more, Neural Computer is first and foremost a judgment on the next generation of computers.

But I have a hunch that its strongest competitive pressure in the future will come from personalized super agents equipped with strong memory, strong tool invocation, and continuous online capabilities.

The table below places these three side by side for viewing.

Quick look at the table: First look at "what you actually get," "how experience solidifies," and "what is installed."

If CNC truly holds, what will change first is the delivery object and the system's organization method.

What is installed today is still software, tools, workflows, and memory entries; on the NC path, what slowly gets installed is more like the capability itself.

Code will of course still exist, but it is no longer the only entry point; instructions, demonstrations, operation trajectories, and constraints will also start to directly undertake the task of "installing capabilities."

The meaning of the word "program" will also change accordingly: it is no longer just a piece of code but more like a capability object that can be installed, combined, versioned, and continuously updated.

Further on, the change will transmit all the way to the system stack and the machine boundary itself. How software is built, how hardware is configured, how updates are governed, and how problems are tracked will all be increasingly reorganized around the same continuously running machine.

Entry points like mobile phones, browsers, IDEs, and terminals will still exist but will increasingly look like different windows accessing the same machine.

In the end, what is rewritten is not just a certain tool stack but the meaning of the word "computer" itself.

Declaration and Acknowledgments: The content and views in this blog represent the original intention of most Neural Computer papers and the personal views of Mingchen Zhuge.

Thanks to Wenyi Wang, Haozhe Liu, Shuming Liu, Yuandong Tian, and Dylan R. Ashley for their review comments.

Some diagrams and materials in the text are cited from the original paper and related public materials.

If you wish to cite this content, you can directly use the arXiv entry or blog entry below.

@misc{zhuge2026neuralcomputers,
  title         = {Neural Computers},
  author        = {Mingchen Zhuge and Changsheng Zhao and Haozhe Liu and Zijian Zhou and Shuming Liu and Wenyi Wang and Ernie Chang and Gael Le Lan and Junjie Fei and Wenxuan Zhang and Yasheng Sun and Zhipeng Cai and Zechun Liu and Yunyang Xiong and Yining Yang and Yuandong Tian and Yangyang Shi and Vikas Chandra and Jürgen Schmidhuber},
  year          = {2026},
  eprint        = {2604.06425},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG},
  url           = {https://arxiv.org/abs/2604.06425}
}

@online{zhuge2026neuralcomputerblog,
  author = {Mingchen Zhuge},
  title  = {Neural Computer: A New Machine Morphology is Emerging},
  year   = {2026},
  month  = feb,
  day    = {7},
  url    = {https://metauto.ai/neuralcomputer/index_cn.html},
  note   = {Research essay},
  urldate= {2026-04-06}
}

References

[1] Alex Graves, Greg Wayne, and Ivo Danihelka. Neural Turing Machines. arXiv:1410.5401, 2014.
[2] Alex Graves et al. Hybrid computing using a neural network with dynamic external memory. Nature 538, 471-476 (2016).
[3] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. ICLR 2024.
[4] OpenClaw. GitHub repository.
[5] Mingchen Zhuge et al. AI with Recursive Self-Improvement. ICLR 2026 Workshop Proposals.
[6] Schmidhuber, Jürgen. Making the world differentiable: on using self supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environments. Vol. 126. Inst. für Informatik, 1990.
[7] David Ha and Jürgen Schmidhuber. World Models. 2018.
[8] The Waymo World Model: A New Frontier For Autonomous Driving Simulation. Waymo Blog.
[9] Demis Hassabis on Waymo World Model and Genie 3. X post.
[10] NVIDIA Research. Cosmos World Foundation Models. NVIDIA, 2025.
[11] World Labs. Marble: A Multimodal World Model. World Labs, 2025.
[12] Dani Valevski, Yaniv Leviathan, Moab Arar, and Shlomi Fruchter. GameNGen: Diffusion Models Are Real-Time Game Engines. Project page, 2024.
[13] Google DeepMind. Genie 2: A large-scale foundation world model. DeepMind Blog, 2024.
[14] Google DeepMind. Genie 3: A new frontier for world models. DeepMind Blog, 2025.
[15] Ljubisa Bajic. The Path to Ubiquitous AI. Taalas.
[16] Leo Gao, Achyuta Rajaram, Jacob Coxon, Soham V. Govande, Bowen Baker, and Dan Mossing. Weight-sparse transformers have interpretable circuits. arXiv:2511.13653, 2025.
[17] Scott Reed and Nando de Freitas. Neural Programmer-Interpreters. arXiv:1511.06279, 2015.
[18] David Ha, Andrew Dai, and Quoc V. Le. HyperNetworks. arXiv:1609.09106, 2016.
[19] David Silver and Richard S. Sutton. Welcome to the Era of Experience. Preprint of a chapter to appear in Designing an Intelligence. 2025.
[20] Sam Altman. The Gentle Singularity. Sam Altman Blog. Accessed March 15, 2026.
[21] Dario Amodei. The Adolescence of Technology. Dario Amodei, January 2026.
[22] Demis Hassabis, Dario Amodei, and Zanny Minton Beddoes. The Day After AGI. World Economic Forum Annual Meeting 2026 session, January 20, 2026.
[23] Carver Mead. How we created neuromorphic engineering. Nature Electronics 3, 434-435 (2020).
[24] Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jürgen Schmidhuber. GPTSwarm: Language Agents as Optimizable Graphs. Proceedings of the 41st International Conference on Machine Learning, PMLR 235:62743-62767, 2024.
[25] Mingchen Zhuge, Changsheng Zhao, Dylan R. Ashley, Wenyi Wang, Dmitrii Khizbullin, Yunyang Xiong, Zechun Liu, Ernie Chang, Raghuraman Krishnamoorthi, Yuandong Tian, Yangyang Shi, Vikas Chandra, and Jürgen Schmidhuber. Agent-as-a-Judge: Evaluate Agents with Agents. Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:80569-80611, 2025.
[26] Wenyi Wang, Piotr Piękos, Li Nanbo, Firas Laakom, Yimeng Chen, Mateusz Ostaszewski, Mingchen Zhuge, and Jürgen Schmidhuber. Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine. arXiv:2510.21614, 2025.
[27] ICLR 2026 Workshop: AI with Recursive Self-Improvement. Workshop website.
[28] Peter H. Diamandis. Elon Musk: Optimus 3 Is Coming, Recursive Self-Improvement Is Already Here, and the Singularity #239. YouTube, March 11, 2026.
[29] I. J. Good. Speculations Concerning the First Ultraintelligent Machine. Advances in Computers, Volume 6, 1966.
[30] Jürgen Schmidhuber. Gödel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements. IDSIA Technical Report, revised December 27, 2004.
[31] Wan Team. Wan: Open and Advanced Large-Scale Video Generative Models. arXiv:2503.20314, 2025.
[32] Xianglong He et al. Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model. arXiv:2508.13009, 2025.
[33] Anssi Kanervisto et al. World and Human Action Models towards gameplay ideation. Nature, 2025.
[34] Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding Conditional Control to Text-to-Image Diffusion Models. ICCV 2023.

More Reading