Anthropic's Latest Study: Using AI to Write Code Might Make You Less Skilled

Anthropic conducted an experiment and then slapped its own product in the face.

Anthropic just published a research paper that tackles a painful question: Will using AI assistance to write code make programmers worse?

The results were surprising:

Programmers who used AI scored 17% lower on tests than those who didn't, equivalent to dropping two full levels.

How was the experiment conducted?

The research team recruited 52 junior software engineers with the following requirements: at least one year of Python experience, writing code weekly; had used AI programming assistants; but had no prior exposure to the Python asynchronous programming library Trio.

They were randomly divided into two groups:

AI Group: Could use an AI assistant to help write code
Control Group: Had to rely on themselves!

Both groups had to complete two programming tasks using Trio, then immediately take a test to assess their understanding of the concepts they had just used.

The results...

In terms of speed, the AI group was about two minutes faster on average, but this difference was not statistically significant.

However, in terms of understanding, the AI group's average score was only 50%, while the hand-coding group scored 67%—a difference of 17 percentage points.

What's particularly painful is that the largest gap between the two groups was in debugging questions.

This means that people who use AI to write code are significantly weaker at finding bugs.

Perhaps, in the future, as more code is written by AI, the most important job for human programmers will be to review and debug these codes.

But if programmers rely on AI from the very beginning and never develop debugging skills, who will provide the safety net for AI?

Why did this happen?

Researchers analyzed the behavior patterns of each participant through screen recordings and discovered an interesting phenomenon:

Some people spent a lot of time talking to AI: the most extreme case sent 15 messages, spending 11 minutes just typing and thinking about questions, accounting for 30% of the total time.

This explains why the AI group wasn't much faster: the time saved on writing code was spent chatting with AI.

While the control group encountered more errors, it was precisely in the process of independently solving these errors that they truly understood how the Trio library works.

Not all usage patterns are the same

Researchers categorized the AI group's behavior patterns into six types, three of which scored low and three scored high.

Low-scoring patterns (average score below 40%):

Full Delegation Type: Let AI write all the code—fastest speed, but learned nothing
Gradual Dependency Type: Asked questions at first, but later handed everything over to AI
Repeated Debugging Type: Asked AI whenever encountering a bug, didn't think for themselves—resulting in both slowness and no learning

High-scoring patterns (average score above 65%):

Generate First, Understand Later Type: Let AI write the code, then ask follow-up questions about principles and explanations
Mixed Explanation Type: Every time they ask AI to write code, they request an accompanying explanation
Pure Concept Type: Only ask conceptual questions, write the code themselves—lots of errors, but ultimately the deepest understanding

See the difference?

The common characteristic of the high-scoring group is: maintaining thinking.

They don't avoid using AI; they use AI to help understand, not to replace thinking.

What does this mean?

The study suggests: Aggressively using AI at work may hinder skill development.

This is especially dangerous for junior engineers. Under time pressure, novice programmers are likely to heavily rely on AI to complete tasks quickly, at the cost of possibly never learning the skills they truly need.

When companies increasingly use AI to write code, but the people responsible for reviewing these codes are engineers "raised by AI," the risks are self-evident.

Researchers suggest:

When managers promote AI tools, they should consciously protect employees' learning opportunities
Junior engineers should understand that the pain of getting stuck, encountering errors, and debugging is actually a necessary path to learning
AI product designers should also think about how to make tools that both improve efficiency and do not harm learning

In fact, mainstream LLM services have already started offering "learning modes."

For example, Claude Code's Learning/Explanatory mode and ChatGPT's Study Mode. The design intention of these features is to mitigate the side effects of "cognitive offloading."

Limitations of the study

Researchers also acknowledged the limitations of this study:

Small sample size (52 people)
Only measured immediate understanding; long-term impact is unknown
Tasks only involved one Python library; may not be generalizable to all scenarios
Used a chat-based AI assistant; if it were a more automated agent like Claude Code, the effects might be more extreme

But regardless, the revelation from this study is:

AI can make you faster, but not necessarily stronger.

If you are learning programming or managing new team members, consider: How to find a balance between efficiency and learning?

After all, the best tool is one that makes people stronger, not one that turns people into tools.

Related Links:

Paper: https://arxiv.org/abs/2601.20245
Anthropic Official Blog: https://www.anthropic.com/research/AI-assistance-coding-skills