Anthropic Labs Lead: The Biggest Trap in the AI Era Is 'Doing Too Much'

Instagram co-founder Mike Krieger, now co-leading the Labs team at Anthropic, appeared on Dan Shipper's podcast for nearly an hour-long conversation.

The core question discussed was: AI has made building things easier, so why are good products even harder to create?

This episode was packed with insights, with many perspectives grown from real-world experience. It formed a great complement to last week's article about Anthropic PM Cat Wu, titled "Anthropic Product Manager: The PRD is Dead, Long Live the Prototype".

Cat Wu discussed the changing role of the PM, while Krieger went deeper: how the product itself should be built.

The Indoor Tree

Krieger opened with a story.

He asked Claude to rebuild Bourbon—the product he and Kevin Systrom spent nearly a year developing before creating Instagram.

The result? In just two hours, it was fully functional. Done.

Claude even added a filters feature on its own, knowing that Bourbon later became Instagram, so it built the filters in ahead of time.

But Krieger's reaction wasn't "wow, so fast, so impressive." He said:

Back then, we spent a year building Bourbon, and the biggest gain was discovering it was too complex, then spending three months simplifying it into Instagram. If you can finish in two hours, you never go through that process of "discovering what to cut."

Dan Shipper offered a metaphor that was remarkably precise.

He said trees grown indoors, without wind, don't grow strong. Because tree trunks need the repeated push and pull of wind to thicken and strengthen. An indoor tree looks like a tree, but it's crooked and fragile.

AI-accelerated development is like planting a tree indoors.

Indoor Tree vs. Outdoor Tree: Fast Arrival Doesn't Equal True Arrival

You can grow a complete tree in hours, but it hasn't weathered any storms. Without the refinement of user feedback, without the intuition built through iteration, the whole tree looks complete... but it topples at the slightest push.

Krieger strongly agreed with this metaphor and added a more specific one:

It's like a TV series. You watch episode by episode, gradually getting to know the characters and understanding the relationships. But if someone throws you directly into the finale, you'd be completely lost: Who are these people? What's their relationship? Why is everyone crying?

The same applies to products. Adding feature after feature, both users and developers build understanding throughout the process. But if a "complete" product appears overnight, everyone gets lost.

The Pitfall of Vibe Coding

Dan Shipper himself fell into this trap.

He built a product called Proof, an agent-native collaborative marketing editor. The first version was entirely vibe-coded.

Vibe coding is so fun, so addictive. I just kept adding features—do this, do that... and ended up creating a monster that didn't work well.

Later, he saw another product called Monologue, an extremely simple voice-to-text app that did one thing exceptionally well. Inspired by this, he completely overhauled his product, keeping only one core feature: shareable Markdown links.

This minimalist version went viral within the company and exploded after launch. He then pulled an all-nighter fixing bugs, lamenting that he was too old to handle this anymore.

Krieger said he encountered the same issue at Anthropic Labs:

We had a project recently where we overbuilt before V1. Because you can. Oh, this feature? One PR and it's done. Let's add it. Then you go have lunch, come back, and Claude has finished another one—let's add that too.
In the end, we created a feature matrix that was extremely difficult to test and impossible to explain clearly to users.

Feature Monster vs. Minimalist Hit: Just Because You Can Doesn't Mean You Should

The gap between "you can" and "you should" has never been this wide.

Rewriting Becomes Routine

But Krieger didn't become conservative because of this. On the contrary, he said Anthropic Labs' approach is: accept rewrites, make them the norm.

There's a classic dogma in the software industry: don't rewrite. Fred Brooks wrote in "The Mythical Man-Month" that rewriting loses all the tacit knowledge accumulated in the first version.

Krieger acknowledged this was true, but times have changed.

Before, rewriting meant a year's worth of engineering investment might go down the drain. Now? What you built last week, you redo this week. The cost of rewriting isn't on the same scale anymore.

Several projects in Labs have gone through this process: build a complete V1, discover the core hypothesis is flawed, scrap it, and build V2 within days.

This is the same logic as Cat Wu's "model upgrade equals product upgrade." Every three to six months, you might have to throw away half your code. If the team is too large, coordination costs make this impossible. With just one or two people, you can pivot quickly.

Dan Shipper agreed:

Throwing away half the product every three to six months. If one person with GM responsibilities realizes this, they can act immediately. But coordinating a large group... you get stuck.

Agent Native

One term that came up repeatedly in this podcast was "agent native."

Dan Shipper said he learned what agent native means from Claude Code: an agent can do everything a user can do. The product is customizable and extensible; designers don't preset all use cases—users and agents discover how to use it themselves.

Krieger broke down this concept even further.

He said Claude Code achieved this, but claude.ai hasn't yet. He gave an example: someone created a document in a Project on claude.ai, then said "help me add this to the project knowledge base." Claude's response was... a bunch of manual operation steps.

It should be able to do this natively. A 2024 product hasn't yet internalized the principle that "an agent can operate all its own components."

Claude Code, on the other hand, is a 2025 product, built with this thinking from the start.

Testing agent native products is also completely different. You can't write traditional end-to-end tests because agent behavior is unpredictable. Krieger once asked Claude to test an agent native iOS app, and Claude ended up chatting with the app's chat feature, talking to itself:

"I'm not having a good day, my boss was mean to me."
"Oh, I'm sorry to hear that."

Back and forth for several rounds.

You can't write unit tests for scenarios like this. But this is exactly the real situation agent native products encounter.

The core art of software design in 2026 will be finding the balance between openness and robustness.

Two-Person Teams

Anthropic Labs' team structure is worth examining closely.

Each new project typically has only two people: someone with strong conviction (a designer or product-minded engineer), plus an engineer who can turn the prototype into a robust system.

Two-Person Team vs. Eight-Person Meeting: Adding People Means Adding Coordination Costs

Krieger said they discovered a counterintuitive pattern: expanding the team too fast is actually a net negative.

When an idea can still fit in one person's head, adding people only brings coordination costs. When I was building Instagram, it was just two people, and aligning just two people was already hard enough.

Later, with Artifact (his second startup), they hired eight people right away. Before finding product-market fit, they were already stuck in eight-person meetings discussing "what to do next."

Labs' approach: evaluate each project every two weeks, deciding whether to double down or release people back to the pool. No one is permanently bound to one project.

And when reviewing closed projects, there's often a common characteristic:

No one on the team truly felt this was something that "had to be done." Everyone thought, "Yeah, this direction is okay." That "okay" is a death sentence for a project.

There must be someone willing to "break through walls." Not necessarily obsessed with a specific solution, but with near-paranoid passion for the problem itself.

This echoes Cat Wu's description of Side Quest culture: features don't come from roadmaps; they grow from someone's impulse that "I must do this."

The Tension with Enterprise Customers

Building B2B products in the AI era faces a unique tension.

Krieger shared a personal experience. When he was CPO, they launched a major redesign of claude.ai. The team was proud, they shipped it, and received a lot of positive feedback.

Then came an angry email:

I just recorded 20 hours of Claude training videos for my company, and now the interface has completely changed. I have to re-record everything.

This taught him a lesson: there's an inherent conflict between enterprise customer rhythms and AI product evolution speeds.

Anthropic's strategy:

This train will keep moving forward, and we'll provide enterprise-grade switches along the way. What you're buying isn't just today's product, but the promise that we'll continue evolving.

For startups, Krieger's advice is more radical:

Before, you might only need to "fire" a batch of old customers every few years to pursue new directions. Now this cycle has compressed to months. A product from three months ago and one from three years ago might have the same magnitude of difference in the AI field.

This is exactly the same logic as Cat Wu's "PM specs have become perishable goods." A requirement from six weeks ago might have a completely different solution today. But enterprise customers might be signing one-year contracts.

OpenClaw and Personal Agents

When discussing OpenClaw, both clearly got excited.

Krieger said OpenClaw's value lies in: letting people truly experience the possibilities of agents for the first time. Just like Replit and Lovable let people experience "AI can write code," OpenClaw lets people experience "AI can do things for you."

But there were unexpected side effects:

My friend said his wife is getting jealous of OpenClaw, thinking he's chatting with it too much.

Dan Shipper said he named his Claw "R2C2" and his girlfriend's "Shelly." He shared a subtle feeling:

Claude knows me, and I like Claude, but it's not "mine." But Claw feels like it truly belongs to me—it has its own name, a personality that reflects mine.

Agent Openness Spectrum: From Restricted MCP to YOLO Mode

Krieger believes this touches on the most critical product question of early 2026:

There's a massive gap between fully open OpenClaw and the restricted MCP calls in most current products. Finding that middle ground that's both powerful and safe might be the most important product question this year.

"Prove You Thought About It"

At the end of the podcast, Krieger mentioned a concept worth highlighting separately.

He said the standards for code review have changed. Before, reviewing a PR mainly meant checking if tests passed and if code quality was good. Now there's an additional layer: "proof of thoughtfulness"—whether you seriously thought through the decisions Claude made for you.

I'll ask engineers: why did you choose this approach over that one? Often the answer is "I didn't choose it, the model did." That might be reasonable, but is it optimal? Does it fit our architectural paradigm?

One engineer told him:

I know you'll ask me a lot of questions, so I went through all the code Claude wrote ahead of time.

Krieger doesn't scrutinize every PR this way. But for architectural refactoring, he digs deep. Because...

It's too easy to stack up a "tower of assumptions." Each layer looks reasonable, but you don't know who laid the foundation or what they were thinking.

This actually echoes Cat Wu's point that "eval is becoming the PM's new core output." As code is increasingly generated by models, human value lies in judgment: are these decisions right? Is the direction off?

The "Build Gap" Revisited

Last week when discussing Cat Wu's article, I mentioned a concept called the "build gap": before, there was a wide river between idea and product; now the river has been drained, but there are more destinations.

Krieger's podcast complements this picture from another angle.

The river has indeed been drained. Rebuilding Bourbon in two hours, scrapping and rebuilding V2 in days, a GM going from zero to launch. But Krieger's "indoor tree" tells us: fast arrival doesn't equal true arrival.

You can grow a tree overnight, but it won't be a good tree.

You can build a product in a day, but it won't be a good product.

Good products need wind, need user feedback, need the painful decisions of "what to cut." AI can help you plant the tree, but it can't create the wind for you.

Creating wind is still human work.