IBM's Neel Sundaresan Says Most AI Coding Is 'Like Driving a Ferrari to Get Milk'

Neel Sundaresan refused to answer three questions. With a touch of playful teasing, he said one of them was: Why is IBM's Bob named Bob?

The evasion is telling. Sundaresan — IBM's General Manager of Software Automation and AI, founding engineer of Microsoft's GitHub Copilot, and previously a researcher at IBM — is not a product marketer. He is a researcher turned developer turned executive, and the thread connecting all three roles is the same obsession: What makes software developers more productive? And what gets in their way?

He has been studying this question since 2000, well before the Transformer model emerged, before large language models existed, and before anyone outside of niche research circles would have considered AI and developer tools in the same sentence. The journey from then to the release of IBM Bob this week — which now has 80,000 users within IBM — was far longer than the press release alone would describe.

Started Long Before Anyone Was Watching

The first system Sundaresan built to improve developer productivity looks nothing like what we now understand as an AI coding tool. It was an API call recommendation system.

"Thirty percent of developer code is API calls," he said in an exclusive interview with The New Stack. "If you have a class calling a certain function, you would get a long list of functions to call, and you have to select from them. That itself is a pain point."

The goal was not to generate code, but to surface the correct function call at the right moment — essentially, a search ranking problem applied to the developer autocomplete experience.

The model was not a Transformer, nor even a deep learning model in the modern sense. But developers loved it, he said. And that early signal — that reducing friction in one small, specific slice of the development flow could create enormous satisfaction — still shapes how Sundaresan thinks about the problem today.

"Programming is an analytical task, not like shopping online," he said. "If the system gives wrong recommendations, or gives recommendations that will interfere with my thought process — that matters."

He argues user experience has nothing to do with what's happening under the hood in the AI. A better model underneath can produce a worse product if the surface design is wrong.

He watched the model landscape unfold: long short-term memory models, early encoder-decoder architectures, Google's Transformer paper, and the first GPT model. At each stage, his team could already see the problem they were trying to solve. The models just weren't yet strong enough. "If you look at the papers we published, we had a paper in all of these areas," Sundaresan said, "and each paper would say, this is the model to solve this problem, and this is the model to solve that problem."

"Even our customers would not let us send data to our own cloud. They wanted it on-prem. So we would actually run a model on a laptop — and a lot of engineering was done to make sure it runs on a laptop."

When cutting-edge models finally became powerful enough that bolder bets could pay off, Copilot was born, he said. But until that point, Sundaresan also spent years watching models make mistakes — and product designs around those models fall short. Training thresholds led to false confidence signals. People gravitated toward the most powerful (and expensive) model for every task regardless of actual need. Running performant models in the constrained environments where enterprises actually operate was nontrivial.

"Even our customers would not let us send data to our own cloud," he recalled of the early Microsoft days. "They wanted it on-prem. So we would actually run a model on a laptop – and a lot of engineering was done to make sure it runs on a laptop."

Why IBM?

When Sundaresan tells this story, the obvious question is why he joined IBM rather than chasing a flashier name. His answer is straightforward: after a decade at Microsoft, he wanted a change, and the terms IBM offered were compelling.

But the less obvious answer is that IBM's liabilities were actually assets for the specific problems he was running toward.

"In software, we have almost 20,000 people. We have infrastructure. We have consulting services. The internal IBM user base is huge," he said. "If I can build something great for them, that itself is a mega product." This internal deployment — what IBM calls "client zero" — gave him something that no external product launch could provide: a vast, diverse, and captive user base willing to tolerate early friction in exchange for real productivity gains.

The other asset was workload diversity. IBM's internal developers do write Python and Rust, but they also write PL/I, COBOL, mainframe JCL, and what Sundaresan calls "custom languages, like slang." If Bob could handle that breadth, it could handle whatever enterprise clients throw at it.

"Before we knock on the door of a customer, we already have a story to tell," he said.

He is also explicit about what the system he's building is optimized for. It is not a generic tool for any developer doing any task, but a system purpose-built for the enterprise environments that most AI coding tools treat as edge cases: legacy codebases, strict compliance requirements, hybrid environments, and the practical cost of AI-generated code that looks production-ready but isn't.

The Cost No One Talks About

One of the more forthright moments in the conversation with Sundaresan came when he described how casually most developers use AI coding tools.

"It's like driving a Ferrari to get milk — completely unnecessary."

"People just say, 'Which model do you want to use?' and they'll pick the latest Sonnet 4.7 or whatever. They could just be running a simple prompt, but that's $40 per million tokens," he said. "It's like driving a Ferrari to get milk — completely unnecessary."

Bob does not expose the underlying model to the user. It automatically routes tasks to Anthropic Claude, Mistral open source models, IBM Granite, or one of several proprietary, fine-tuned models built specifically for the Bob environment, depending on what the task actually requires.

Sundaresan argues the real architecture is in the routing intelligence. "It's not just about plugging a model into a system," he said. "It's about bringing a model, bringing an experience, and building an architecture that enables a great experience. All three have to come together. The model is only one part of the equation."

He described running A/B tests across IBM's internal user base — pitting different variants of frontier models against one another, monitoring usage patterns, and identifying where expensive models were being used for tasks that cheaper models could handle equally well. This internal deployment made possible the kind of large-scale experimentation that early product releases could never afford.

Where Is the Agent Marketplace Actually Going?

If you ask Sundaresan about the agentic AI hype cycle, he gives you the researcher's answer, not the general manager's.

"There is no smoke without fire," he told The New Stack. "If hype is the smoke, there is a fire burning somewhere. It may not be as big as the smoke, but the fire is real."

He sees agent-based development as real, but not new. Service-based development, API-based development, agent-based development — all of these have existed before. What changed is that the interface is now probabilistic and conversational rather than deterministic and programmatic. That shift creates genuinely new capabilities, but also genuinely new risks.

"We can do nothing because we are afraid, or we can be bold and methodical."

"You can also distract it," he said of agentic systems, "ask it questions you shouldn't be asking, or reveal things you shouldn't be revealing." He argues the 91% AI project failure rates he observes boil down to discipline — or the lack of it. Enterprises tend to think signing a deal with a frontier model provider is enough. It isn't. "You need to make sure you follow certain disciplines before you integrate them into your software products," Sundaresan said.

The direction he is watching, and where he believes more attention should be directed, is toward agents that talk to other agents and eventually develop machine-native languages that humans cannot directly read. "If bugs creep in into these derived languages, it can be deadly," he said. "There's a lot of work ahead. We can do nothing because we are afraid, or we can be bold and methodical."

IBM's Neel Sundaresan Says Most AI Coding Is 'Like Driving a Ferrari to Get Milk'

Started Long Before Anyone Was Watching

Why IBM?

The Cost No One Talks About

Where Is the Agent Marketplace Actually Going?

Related Articles

分享網址