Reporting by New Intelligence
Editor: Tao Zi
【New Intelligence Insights】Can you believe it? An AI living 95 years in the past actually wrote Python code. The father of GPT has stepped in, using 260 billion tokens to forge a 'vintage' AI.
An AI that has never seen a computer has written a modern programming language!
This is not the setup for some sci-fi story.
Just today, Alec Radford, the father of GPT, led a team to unveil the internet-shaking 'talkie'—
A large model with a total of 13 billion parameters, trained exclusively on pre-1931 literature.
Talkie's 'worldview'—its entire training data—is frozen on December 31, 1930.
In that era, there was no internet, no Wikipedia, and absolutely no modern code.
The most 'recent' things it has read are patent books, scientific journals, etiquette manuals, and private letters from nearly a century ago.
Yet, this AI 'living in 95 years ago' can actually write Python code.
Never Learned Programming, Yet Writes Python and Understands 'Inverse Functions'
Talkie's most explosive discovery lies hidden in a set of programming tests.
Alec Radford's team had a wild idea: they used HumanEval to test talkie's programming ability—
They gave it a few Python functions as contextual examples and then asked it to solve new programming problems.
Keep in mind that in talkie's training data, there is not a single line of modern code. The concept of a digital computer didn't even exist within its 'knowledge system'.
But the results are shocking. Through few-shot learning, it can actually write correct Python programs.
Admittedly, it can currently only complete simple one-liners, like adding two numbers, or making minor modifications to the contextual examples.
Alec Radford: The key mastermind behind GPT, CLIP, and Whisper.
One particularly impressive case involved a function to encode a rotational cipher, encode_shift, whose logic shifts each letter five places backward in the alphabet.
Talkie wrote the corresponding decode function on its own. The entire modification was just one character: it changed +5 to -5, swapping the plus sign for a minus sign.
It truly understood the concept of an 'inverse function': if encryption is addition, decryption is subtraction.
Portal: https://talkie-lm.com/chat
260 Billion Tokens, Fed Exclusively on Century-Old Paper
Why did Alec Radford's team go to such great lengths, manually OCRing nearly century-old physical documents to train a 'vintage' AI?
Because they wanted to answer a core question in the field of AI: Is an LLM's ability reasoning, or is it just recitation?
Talkie's ability to write Python proves—
LLMs can reason using 19th-century knowledge; it's not merely retrieval. It must be said, this is 'generalization' in its truest sense!
Talkie's training corpus can be considered a massive 'archaeological project'.
Its training data reached 260 billion tokens, all sourced from English texts dated before 1931, including books, newspapers, journals, scientific papers, US patents, and case law.
All of this text had to be scanned from physical documents and transcribed via OCR.
The choice of 1930 as the cutoff date was practical: it is the dividing line for US public domain law.
However, this created an unexpected bottleneck: data quality.
The team ran a controlled experiment: when training models on old texts transcribed by traditional OCR systems versus the same texts transcribed by humans, the former's learning efficiency was only 30% of the latter's.
Simple regex cleaning could raise this figure to 70%, but a massive gap remained.
In the experiments evaluating talkie's performance, the team also built a 'modern twin' (talkie-web-13b-base).
The latter was trained on FineWeb's modern internet data, and both models used the 'same compute budget'.
Clearly, on core language understanding and mathematical reasoning tasks, talkie's performance was comparable to its modern twin.
But on general knowledge benchmarks, even after removing questions that were 'anachronistic' from a 1930 perspective, talkie still lagged behind.
The team suspects this has a lot to do with data quality.
For this reason, Radford's team plans to train a 'retro OCR system' from scratch, specifically designed to re-transcribe pre-1931 texts.
Using the Most Modern Claude 4.6 to Train the Most Ancient AI
Talkie's 'post-training' scheme is also fascinating.
To turn a 'base model' that has only read old books into a conversational chatbot, there was no ready-made instruction fine-tuning data available.
The team's approach was to extract instruction-response pairs from pre-1930 structured reference books: etiquette manuals, letter-writing guides, cookbooks, encyclopedias, and poetry collections.
They then used these 'vintage textbooks' for the first round of SFT.
In the subsequent RLAIF stage, the team used online DPO to enhance talkie's instruction-following ability, with Claude Sonnet 4.6 acting as the judge.
A 2026 state-of-the-art AI, grading an AI 'living in' 1930.
In the final fine-tuning stage, the team even used Claude Opus 4.6 to generate multi-turn dialogue data to polish talkie's conversational skills.
During training, Claude's score for talkie's instruction-following ability rose from 2.0 to 3.4 (out of 5).
The last step involved using Claude Opus 4.6 for multi-turn synthetic dialogues with talkie, followed by another round of rejection sampling plus SFT to refine conversational ability.
The team also admitted an irony: training a model meant to be frozen in 1930 using a modern large model is itself a form of 'temporal contamination'.
Their long-term goal is to use a vintage base model itself as the judge, creating a fully 'bootstrapped' post-training pipeline.
It is worth mentioning that a funny side effect appeared after RL training for talkie's 7B version—
It started to speak in list format, purely infected by a 'bad habit' from modern AI.
The Cleanest 'Open-Book Test' in AI History
The research team conducted another interesting experiment.
They extracted nearly 5,000 historical event descriptions from the New York Times' 'On This Day' column and computed talkie's 'surprise' level for each event.
The results were very clear. For events before 1930, talkie was not very surprised. For events after 1930, the surprise level began to climb.
It peaked in the 1950s and 1960s before leveling off.
This curve itself is an experiment on predictive ability. How would this curve change as the model scales up?
Demis Hassabis, CEO of Google DeepMind, once proposed a thought experiment—
A model trained only up to 1911, could it independently discover the general theory of relativity, as Einstein did in 1915?
Talkie certainly cannot do that right now. But it provides a path: just scale up.
Expanding to GPT-3 Level This Summer
Talkie currently has 13 billion parameters. The team's roadmap is quite aggressive—
This summer, they plan to release a GPT-3-level vintage model.
A more distant goal: to expand the corpus to over one trillion tokens, theoretically enough to train a model at the GPT-3.5 level, with capabilities approaching the original ChatGPT.
A ChatGPT frozen in 1930.
References:
https://x.com/status_effects/status/2048878495539843211?s=20
https://talkie-lm.com/introducing-talkie
Chase ASI in Seconds
⭐ Like, Forward, and Watch with One Click ⭐
Light up the star to lock in New Intelligence's lightning-fast updates!