OpenAI Legend Reveals: Undergraduate Lands Job at OpenAI with Just One Blog Post! No PhD, Zero Papers

New Intelligence Unit Report

Edited by: Aeneas

[New Intelligence Unit Guide] He has no PhD, no papers, yet by publicly improving papers and running benchmarks, he directly impressed the big shots and joined OpenAI! Noam Brown personally confirms: Execution power and open-source projects are the real tickets to breaking into top AI labs.

Recently, a post by OpenAI legend and father of poker AI Noam Brown went viral.

No PhD, no research background—is it possible to land a job at a top AI lab?

This sounds like a fantasy, but the wonderful thing about this world is that such examples actually exist.

For instance, a young man named Keller Jordan successfully joined OpenAI as a machine learning researcher based solely on an open-source blog post!

Yes, he didn't write papers; instead, he completely open-sourced the full research process, code, and experimental results on GitHub.

Noam Brown concluded: Although there is less space for open research today than before, improving upon existing papers remains an excellent way to prove your capabilities to lab researchers!

This approach also gives the other party more confidence to secure an interview opportunity for you.

From AI Content Moderation to Career Success

In 2020, Keller graduated from UCSD with dual bachelor's degrees in Mathematics and Computer Science.

Upon graduation, he had never published any papers.

His first job was at an AI content moderation startup.

One day, he saw a recent paper published by Google Research expert Behnam and thought of an improvement idea, so he sent Behnam an email.

After seeing the email, Behnam agreed to mentor this young man. Without connections or background, the young man thus connected with a major figure.

Even more amazing, this collaboration eventually led to an ICLR paper.

Later, Keller's impressive work "NanoGPT speed run" directly changed the research paradigm. This not only earned praise from Tesla AI chief Karpathy but also caught OpenAI's attention.

This wasn't a paper in the traditional sense, yet it became the turning point in Keller's fate.

Because all his work was fully documented with quantifiable results and clear progress, OpenAI didn't hesitate to extend an olive branch to him.

Praised by Karpathy as "Well Done"

NanoGPT is an open-source project by Karpathy—a minimal, lightweight GPT training and fine-tuning framework.

One thing Keller loved doing was constantly refreshing NanoGPT's training speed. To this end, he kept trying new methods.

In October 2024, he achieved results that improved Transformer model training token efficiency by 3.8 times!

This directly won him Karpathy's high praise.

The goal of NanoGPT speedrun sounds very simple: under fixed model size (124M Transformer) and fixed validation loss target (3.28 val loss), complete training with as few tokens and as little time as possible.

What Keller did was take Karpathy's nanoGPT/llm.c PyTorch training code and transform it into a reproducible, quantifiable, comparable benchmark.

Ultimately, he improved token efficiency by 3.8 times, reducing the required tokens from about 10B to 2.7B to reach the target loss.

This means the improvement can be strictly verified—a hard metric.

Making Experiments Cheap Enough for "Everyone to Participate"

Moreover, Keller was highly innovative.

Unlike many trainings costing hundreds of thousands or even millions in computing power, when designing this speedrun, he had a very clear principle: make the cost of trying new ideas low enough.

To this end, he deliberately did several things: compressing the code to minimal—only 537 lines; in a fresh 8×H100 environment, installation and running time takes only 20 minutes; even a single attempt costs as low as $8.

Even in today's AI research environment, this is an extremely rare design choice.

This means innovation is no longer blocked by computing thresholds—not just big labs, but individual researchers, students, and independent engineers can all quickly validate ideas.

Noticed by OpenAI

Thus, NanoGPT speedrun became a crucial step in Keller's rise.

Everything shows this achievement is solid: code, logs, experiments are fully reproducible; the metrics are completely cheat-proof; even the dev community's real participation.

Even the verification method was designed to be extremely rigorous: every speedrun's log file contains a complete copy of the code.

Anyone wanting to reproduce a new record can simply call the log file.

Muon Emerges

Then, the whole thing reached its climax.

At the end of 2024, the optimizer Muon he designed for neural network 2D parameter hidden layers emerged, breaking world records for NanoGPT and CIFAR-10 training speeds with outstanding performance!

Muon is an optimizer designed for 2D parameter hidden layers in neural networks. Its core idea is that update matrices generated by SGD-momentum are orthogonalized through Newton-Schulz iterations, generating updates close to semi-orthogonal matrices, thereby improving training efficiency.

Its implementation is simple and efficient, supporting stable operation under bf16 precision, significantly reducing computational overhead.

Compared to the AdamW optimizer, Muon performed amazingly on multiple tasks.

Although AdamW allows GPT, LLaMA, and Qwen to learn stably and quickly, as model parameters increase from hundreds of millions to hundreds of billions, and training time extends from days to weeks or even months, AdamW's limitations begin to show.

Although it hasn't become a mainstream general-purpose optimizer yet, Muon's emergence shows it is likely a major fundamental innovation in the field of AI model training.

Joining OpenAI

As Muon's influence in the developer community grew, Keller officially joined OpenAI in December 2024.

Interestingly, Keller stated in February that although Muon became popular and helped him enter OpenAI, he won't write a paper about Muon.

In his view, rather than publishing a paper on arXiv that would likely be "buried," it's better to continue honestly researching his optimizer.

After all, in his view, most optimizer papers are fake fluff.

These People All Successfully Broke into Big Companies

Additionally, Noam Brown cited other successful cases.

For example, Sholto Douglas, who was discovered by Google DeepMind.

He keeps a low profile on X, has never published any notable papers as first author, and has only been in the industry for a year and a half. However, he is a key figure behind Gemini's success.

While still working at McKinsey, Sholto gradually became convinced that AI would explode, so he started working on his own projects in his spare time and raised many insightful questions on Jax's GitHub.

These performances impressed James Bradbury, and he was eventually invited to interview at Google DeepMind.

Andy Jones is a semi-retired quantitative analyst. Before test-time compute became popular, he wrote a paper comparing the effects of scaling up pre-training versus scaling up test-time compute.

This paper was extremely impressive—not because it refreshed some benchmark, but because it made very clever design choices, writing his own GPU-accelerated environment and conducting rigorous, detailed ablation experiments.

Ultimately, Andy Jones joined Anthropic.

References:

https://x.com/polynoamial/status/2014084431062114744

https://x.com/polynoamial/status/2014084432685326485

https://x.com/polynoamial/status/2014084509575291163

OpenAI Legend Reveals: Undergraduate Lands Job at OpenAI with Just One Blog Post! No PhD, Zero Papers

Related Articles

分享網址