World's First AI Scientist Publishes in Nature: Mastering the Entire Research Process from Idea to Paper, Passing Blind Human Review

After a year and a half of development, automated AI research has reached a historic milestone.

The AI Scientist system, developed by Sakana AI in collaboration with the University of British Columbia, the Vector Institute, and the University of Oxford, has officially been published in Nature. The paper details the system's architecture, presents new scaling law results, and deeply explores the future and challenges of AI-generated science. Currently, the code and generated papers for both the first and second generations of the system are open-sourced on GitHub.

Nature:

https://www.nature.com/articles/s41586-026-10265-5

GitHub:

https://github.com/SakanaAI/AI-Scientist

https://github.com/SakanaAI/AI-Scientist-v2

Looking back over the past year and a half, the development of the AI Scientist system has gone through two critical phases.

The first phase was proving feasibility. The research team initially provided the system with only a basic code template, similar to simple nanoGPT training. The system successfully achieved autonomous generation of new ideas, designed and ran experimental tests, and independently wrote a complete paper. To evaluate the quality of these papers, the team also developed an automated reviewer system. This historically proved for the first time that end-to-end automation of the entire machine learning research process is entirely feasible.

The second phase challenged the "Turing Test" of the scientific community. In its second update, the system was granted immense freedom to explore any broadly defined topic within AI research. The team submitted papers generated entirely by AI, without any human editing, directly to a workshop at ICLR 2025 to undergo rigorous double-blind peer review by humans.

The results were shocking. One manuscript received individual scores of 6, 7, and 6, achieving an average score of 6.33. This directly crossed the acceptance threshold set by humans, with scores exceeding 55% of papers written by human authors. The entire submission process was approved in advance by the workshop organizers, and the paper was withdrawn as planned immediately after confirmation of acceptance.

This new Nature paper not only summarizes these breakthroughs but also reveals the underlying large model optimization mechanisms that made them possible. In actual operation, once given a general research direction, the system can autonomously generate innovative research ideas, retrieve and read relevant literature, and design, program, and execute experiments through parallel agent tree search. Finally, it writes the entire paper in LaTeX, calling upon vision-capable foundation models during the process to provide feedback for generating figures.

New Discovery: Automated Reviewers and Scaling Laws in Science

To evaluate AI-generated scientific achievements on a large scale while avoiding overburdening human reviewers, the team created an automated reviewer system.

The system is configured to act as an area chair, synthesizing five independent review opinions and making final decisions strictly according to official NeurIPS guidelines. The research team benchmarked it against tens of thousands of real human review decisions from the OpenReview dataset. The results showed that the automated reviewer's performance is comparable to humans, achieving a balanced accuracy of 69%. Its F1 score even surpassed the inter-human consistency measured in the famous NeurIPS 2021 consistency experiment.

More importantly, by using this reviewer system to evaluate papers generated by different foundation models, the team discovered a clear scaling law: the stronger the underlying foundation model, the higher the quality of the papers generated by the system. This strongly suggests that as computational costs decrease and model capabilities increase exponentially, future versions of the AI Scientist will unleash even more astonishing power.

Limitations and Future: On the Eve of Machines Replacing Humans?

Although passing human peer review is a huge breakthrough, the AI Scientist system is still in its early stages, and the Nature paper candidly admits several limitations.

For instance, it occasionally proposes naive or immature ideas; it struggles with extremely deep rigorous methodologies and complex code implementations; and the system is prone to hallucinations or obvious低级 errors, such as generating non-existent references or repeating the same chart in the paper's appendix.

However, there is a clear trend in the field of machine learning: once a new capability starts working, even with obvious flaws, it will surpass human levels in an extremely short time driven by scale expansion and core model upgrades. Currently, the AI Scientist is limited to computational experiments, but the team expects this paradigm to be adapted to other fields, catalyzing progress across the entire scientific community through open-ended exploration and discovery.

A Fundamental Shift in the Paradigm of Scientific Discovery

The ability to automatically generate papers inevitably triggers profound ethical and social discussions. These include the risk of the peer review system collapsing and academic credentials being falsely inflated.

The research team emphasizes that this technology must be developed responsibly. This includes informing the public that AI-generated papers are not only a reality but, in some cases, can fully rival human-level work. The team proactively withdrew the accepted AI paper, and all experiments were approved by an ethics review committee. Furthermore, the team watermarked all AI-generated papers to explicitly indicate their origin and calls on the entire academic community to adopt this practice, establishing clear norms for handling AI-generated research as soon as possible.

Source: https://sakana.ai/ai-scientist-nature/


分享網址
AINews·AI 新聞聚合平台
© 2026 AINews. All rights reserved.