The Era of Software 3.0 Has Arrived

Karpathy said he has never felt more like a "left-behind" programmer than he does right now.

If this came from someone else, or even me, it might seem normal… but when Karpathy says it, the weight is completely different.

Karpathy is a co-founder of OpenAI, the former head of AI at Tesla, and now the founder of Eureka Labs. He is one of the most influential technical evangelists of the entire deep learning era. Although he likes to coin new terms and is seen as having some "internet celebrity" traits, his fame is backed by real skill, no fluff.

Karpathy AI Ascent 2026

Last week, at Sequoia Capital's AI Ascent 2026 conference, he had a 30-minute fireside chat with Sequoia partner Stephanie Zhan.

The theme was: Software is undergoing its third paradigm shift.

若影片無法播放,請改看 來源頁

After the event, Stephanie Zhan posted a summary, saying:

"Last year he coined the term 'vibe coding'. This year, he 'has never felt more behind as a programmer'.

Screenshot of Stephanie Zhan's summary

She also summarized the core message of the entire conversation in one sentence:

"Vibe Coding raises the floor. Agentic Engineering raises the ceiling. One is about access, enabling more people to build. The other is about excellence, using agents without sacrificing safety, reliability, maintainability, and taste."

Karpathy himself also posted a long thread afterward summarizing three main themes.

Karpathy's thread summary

Among them, the most important section is about the "New Frontier" of LLMs:

"LLMs are far more than just accelerating existing workflows (like writing code). Some applications can be entirely consumed by LLMs, requiring no traditional code. Some installation scripts can replace .sh files with English .md files, because 'LLMs are a high-level English interpreter'. There are also things like LLM knowledge bases, which were previously impossible to build because they require computation on unstructured data."

Software 1.0

Before understanding what Software 3.0 is, let's look back at 1.0 and 2.0.

Software 1.0 is the traditional programming we are familiar with.

Diagram of Software 1.0's dilemma

Programmers write code in Python, C++, Java. Every line is an explicit instruction: if A, then execute B, else C. The logic is written by humans, and bugs are human-made too.

From the 1950s to now, the vast majority of software falls into this category.

This paradigm has dominated software engineering for over half a century.

Its benefits are obvious: deterministic, debuggable, explainable. But it also has a fundamental limitation: scalability is constrained by human intelligence and time.

You can't write a manual set of rules for a computer to recognize a cat in a photo. You can't write a logic block for a machine to translate Chinese into English. You certainly can't write an algorithm for an AI to learn to play Go from scratch.

After all, for these tasks... humans themselves can't articulate the rules, so how can they be written into code?

2017: Software 2.0

In November 2017, while Karpathy was still at Tesla working on Autopilot, he wrote a blog post on Medium: Software 2.0.

Software 2.0: Training is Programming

This article was later considered one of the most influential conceptual pieces in the AI field.

His core argument: Neural networks are not just a new tool, they represent a brand-new programming paradigm.

In Software 1.0, programmers write code. In Software 2.0, programmers prepare data and objective functions, then let optimization algorithms (gradient descent) search for a set of neural network weights.

"Software 2.0 is written in a much more abstract, human-unfriendly language, such as the weights of a neural network. No human is involved in writing this code because there are a lot of weights (typical networks have millions), and coding directly in weights is just impossible."

In other words, the training process is the programming process, and the dataset is the source code.

In the article, he listed a long series of domains where Software 2.0 has already "eaten" Software 1.0:

Image recognition shifted from manual feature engineering to deep convolutional networks; speech recognition moved from Gaussian Mixture Models to end-to-end neural networks; machine translation went from phrase-based statistical methods to the Transformer architecture; Go playing evolved from hand-crafted evaluation functions to AlphaGo Zero's self-play.

The benefits of Software 2.0 are: computational graphs are homogeneous (primarily matrix multiplication), easy to optimize for hardware. Runtime is predictable, with no infinite loops. No dynamic memory allocation is needed, eliminating memory leaks.

And in many fields, its performance has already far surpassed human-written solutions.

At the end of the article, Karpathy predicted at the time that when we eventually build AGI, it would definitely be written in Software 2.0.

This prediction was highly successful, except looking at it from today, in 2026... it seems only half right, because:

3.0 Has Arrived

Karpathy himself admits that the Software 2.0 article was written too early. GPT hadn't emerged yet, and Transformer had just been published. He didn't foresee one thing: Large Language Models.

The programming method of Software 2.0 is: prepare data, design network architecture, train. This process requires extensive machine learning expertise, GPU clusters, and weeks or even months of training time. The barrier to entry is extremely high.

Evolution of three generations of software

The programming method of Software 3.0 is completely different.

You write a prompt, give the model some context, and it executes.

No need for training, no need for gradient descent, no need for labeled data.

You "program" in natural language, use the context window as "memory", and use tool calls as "system calls". The LLM becomes a new type of computational interpreter, and the prompt becomes its source code.

At AI Ascent, Karpathy gave a clear summary of the three generations of software:

  • Software 1.0: Humans write the code.
  • Software 2.0: Neural networks are trained on data.
  • Software 3.0: Programming is done via prompts, context, agents, tools, memory, and verification.

The context window has become the new programming interface.

The MenuGen Example

Karpathy gave an example of a project he built himself: MenuGen.

Imagine this scenario: you walk into a restaurant, take a photo of the menu, and then an app automatically identifies each dish's name, searches for dish images, reformats everything, and generates a beautiful menu with pictures.

Doing this the traditional way, the process would look like this: first, use OCR to recognize text, then use NLP to extract dish names, then call an image search API, and finally write a frontend for reformatting. At least a few hundred lines of code, maybe a day or two of work.

But Karpathy found that simply throwing the photo at Gemini, letting it overlay dish images onto the original picture, made the entire middle layer of the app redundant.

MenuGen: The middle layer disappears

This is the most disruptive aspect of Software 3.0: some applications are not being built faster; they are being swallowed entirely by the model's native capabilities.

"Don't just ask what AI helps you build faster. Ask: what does AI make no longer necessary."

From Vibe Coding to Agentic Engineering

The term "Vibe Coding" was coined by Karpathy himself in early 2025. At the time, AI coding tools were just taking off, and he used it to describe a new development style: ignoring code details, collaborating with AI based on intuition and natural language, "as long as the vibe feels right".

Image related to Vibe Coding

The term then went viral, more than he ever expected.

But what Karpathy wanted to emphasize this time is: Vibe Coding is just a warm-up, nothing more.

In December 2025, he said he experienced a "flip". He went from writing 80% of the code and having agents write 20%, to suddenly a 20/80 split.

In 2026, this ratio continues to tilt further.

He even developed "AI psychosis" from it—spending 16 hours a day talking to agents, wanting to launch the next task the moment one finishes, feeling like he was slacking off if his tokens weren't all spent.

In this talk, Karpathy further breaks down this shift into two layers:

Floor and ceiling diagram

Vibe Coding raises the floor.
Anyone, even those who know nothing about programming, can describe what they want in natural language and have AI generate a working application. This was previously unthinkable. A designer can make a prototype, a product manager can build an internal tool, a student can create their own project. The barrier is pulled down to nearly zero.

Agentic Engineering raises the ceiling.
Professional developers use agents in a completely different way. It's not just "letting AI help write code". They are designing an entire system: agents generate plans, agents code, agents test, and agents check each other's work. This pipeline must ensure no security vulnerabilities, clean architecture, and system robustness.

"Vibe Coding raises the minimum floor for what everyone can build as software. Agentic Engineering must guard the quality bar that professional software has historically achieved."

Verifiability

Karpathy also raised a key point in his talk: Traditional software automates what you can specify, while AI automates what you can verify.

This, in my view, is quite crucial.

Is the code correct? Run a test to find out. Is the math correct? Compute it. Are there security vulnerabilities? Scanning tools can discover them.

These domains share a common trait: the output can be objectively evaluated.

It's precisely because they can be verified that these tasks can enter a reinforcement learning loop, allowing the model to train and improve continuously.

This is also why AI has advanced so rapidly in coding, math, and code security, while still making bizarre errors in "common sense reasoning".

Karpathy gave an example: a model can refactor a 100,000-line codebase and discover zero-day vulnerabilities, but it will also tell you, "Your car is at a car wash 50 meters away; I suggest you walk there."

This phenomenon is called "jagged intelligence."

Jagged intelligence chart

The capability curve doesn't rise smoothly; it has peaks and cliffs. Peaks appear in domains covered by the labs' data and reward signals; cliffs appear outside the training distribution.

Karpathy's subsequent thread explained further: the jagged distribution isn't just about verifiability, but also economics. Revenue and Total Addressable Market (TAM) determine which domains frontier labs choose to cover in RL training.

"You're either inside the data distribution, rocketing along on the RL track. Or you're off the track, hacking through the jungle with a machete."

It's like Claude can refactor 100,000 lines of code because someone paid to train that capability. But for a problem like "how to get to the car wash," no one has paid to train for it, and it's not a focus for the model.

It's not really an intelligence problem; it's an economic allocation problem.

And for entrepreneurs, the corresponding opportunity is this: find domains where you can construct a verification environment but haven't yet been covered by the big labs' reinforcement learning. If you can design your own reward function, even if mainstream models aren't specifically optimized, you can still gain a significant advantage through fine-tuning.

Harness

Karpathy also talked about a concept: harness (sometimes called scaffolding).

This term has been especially hot in agent technology recently. I previously wrote a dedicated article introducing it, titled "The Model Isn't Key, the Harness Is."

Harness concept illustration

The Harness concept was first proposed by HashiCorp co-founder Mitchell Hashimoto back in February of this year. While using AI for his Ghostty project, he summarized a principle:

"Every time you find an agent making a mistake, take the time to engineer a solution so it never makes that same mistake again."

Later, companies like Cursor, OpenAI, and Anthropic also published engineering blogs about harnesses.

For example, a core issue discussed in Anthropic's blog was: How can an AI Agent maintain progress across multiple context windows?

Agent shift handover illustration

Their solution is: initialize the agent to set up the environment, create a progress file (claude-progress.txt), and establish a baseline commit. An executing agent completes features one by one, and every time a new context window starts, it first reads the git log and the progress file to continue from the last round.

This is like writing a handover document for agents.

Karpathy's view is even more radical: future software should be rewritten for agents, without any consideration for human users.

Current software interfaces are designed for people—with buttons, menus, and mice. But agents don't need these. What agents need are: machine-readable interfaces, clear permission declarations, decomposable workflows, and explicit instruction formats.

In his thread, he gave a small example: why write complex .sh installation scripts? You can completely use a .md file, write down the installation steps in English, and then tell the user "just show this file to your LLM."

The LLM, as a high-level interpreter, can intelligently adjust the installation process based on your system environment and can inline-debug when problems arise.

Replacing .sh, .py, .go files with .md is also something I have recently adopted.

The Human's Place

After saying so much about AI and agents, Karpathy didn't forget to circle back to humans themselves.

His judgment is: Understanding is not outsourceable.

An agent can call APIs, write code, and run tests. But there are several things it truly still cannot do:

System specification design: For example, knowing a user system should use a stable user ID instead of an email address to associate funds. This judgment requires understanding business logic, experience, and foresight of consequences.

Conceptual understanding: You need to truly understand what a tensor is, how memory views work, and the principles behind storage mechanisms—not just know the names of APIs. Otherwise, you can't judge whether the code the agent writes is doing the right thing.

Taste: Code written by agents is often "workable but ugly." It passes all tests, the functionality is correct, but the architecture is a mess, the naming is awful, and the complexity is out of control. You need to know what good design looks like.

"You can outsource your thinking, but not your understanding."

Understanding is not outsourceable

If you don't understand what the system is doing, you won't know where it went wrong when the agent makes a mistake.

And agents will definitely make mistakes.

Sometimes, when someone answers my serious work questions with "AI thinks…" I interrupt and say: don't give me "AI thinks, AI's judgment"; give me "you think, your judgment."

Three Generations of Software

Looking back at the evolution of these three software paradigms, the main thread is: the way humans participate is constantly changing.

In the Software 1.0 era, the programmer was the author. Every line of code was personally typed out by them; the logic was designed by them, and the bugs were fixed by them. They had complete control over the entire system and bore full responsibility.

In the Software 2.0 era, the programmer became a coach. They no longer wrote logic directly but instead prepared training data, designed network architectures, and tuned hyperparameters, letting the optimization algorithm search for the solution. Their job shifted from "telling the machine how to do it" to "showing the machine how it's done."

In the Software 3.0 era, the programmer becomes a commander. They don't need to prepare data or train models. They only need to describe their intent in natural language, provide context, and then direct a swarm of agents to execute. Their job has shifted from "showing the machine" to "telling the machine what they want."

From how, to show, to what.

Author to Coach to Commander

But the flip side of this main thread is this: With each generational shift, what humans lose is control over execution details, and what they gain is a higher-level lever.

As Karpathy said on a previous podcast: before, anxiety was about GPUs being idle; now, anxiety is about not spending all your tokens. The bottleneck used to be computational resources; now, the bottleneck is the human.

How much token throughput you control determines how much you can get done.

At the end of his thread, he threw out an even further outlook: fully neural computing.

The future he envisions is one where the vast majority of computation is handled by neural networks, while traditional CPUs become "co-processors," only responsible for some auxiliary tasks.

This echoes the prediction at the end of his 2017 Software 2.0 article. Back then, he said AGI would definitely be written in Software 2.0.

Now, it seems he might think AGI is more likely born from Software 3.0, running on a brand-new architecture dominated by LLMs with traditional computing as a support.

The 3.0 Era

In my view, Karpathy didn't talk much about "future visions" this time. What he talked about was basically what has already happened.

Software 3.0 is not just a concept, nor a prediction. If you can read this article on this account, I think you are likely already in it, and feel the same way.

Anthropic's "2026 Agentic Coding Trends Report," released this year, mentions that in 2025, agentic AI changed how many developers write code, and 2026 will be the year this transformation begins to restructure the entire software development lifecycle.

As for myself, I also stopped writing a single line of code by hand sometime around the second half of 2025, instead spending minutes to hours talking to agents every day...

The Software 3.0 era is not about to arrive. It is not coming soon.

It is right here, right now.

◇ ◆ ◇

Related Articles

分享網址
AINews·AI 新聞聚合平台
© 2026 AINews. All rights reserved.