In April this year, Anthropic made a high-profile release of its new AI model, Mythos, touting its code vulnerability detection capabilities as "dangerously powerful." Due to safety concerns, the company decided not to make it publicly available, granting access only to a few large institutions so they could prioritize patching critical flaws.
The industry responded strongly, with rumors circulating that the model could find thousands of zero-day vulnerabilities within weeks, raising questions about the state of software security. This topic quickly became a hot topic in the tech community.
Daniel Stenberg, the founder of curl, indirectly obtained an analysis report on the curl codebase generated by Mythos through the Linux Foundation's Alpha Omega project. The results showed that after scanning 176,000 lines of C code, the model confidently claimed to have found five "confirmed security vulnerabilities."
Stenberg noted, "The curl codebase is 1.12 times the size of the English version of 'War and Peace'. It's amusing that an AI would unilaterally declare vulnerabilities as 'confirmed'. After several hours of review by the team, only one of the five issues was an actual vulnerability. The other three were false positives already noted in the API documentation, and the fourth was just a regular bug."
Gap Between Hype and Reality
The confirmed low-severity vulnerability is planned to be fixed in curl version 8.21.0. Stenberg believes the hype surrounding Mythos leans more towards a marketing tactic: "Based on the available evidence, this model has not demonstrated any special advantage over existing tools in finding issues."
As the world's most widely used data transfer library, with over 20 billion deployments, curl's codebase has undergone multiple rounds of auditing by tools like OSS-Fuzz and Coverity. Before Mythos, other AI tools like Zeropath had already uncovered 200-300 defects, including over a dozen CVEs. Mythos arrived at the detection scene quite late and with limited coverage.
Technical Value Lies in Detection Speed
A case study from Mozilla showed that Mythos found over 270 vulnerabilities in the Firefox browser. Its value lies primarily in detection speed—shortening the time window between discovery and remediation. However, Mozilla emphasized that these vulnerabilities could equally have been found by top-tier human auditors.
Stenberg acknowledged the progress of AI code analysis tools: "The new generation of AI tools is significantly better than traditional analyzers at detecting security flaws in source code." But he pointed out that, at least for curl, Mythos has not yet demonstrated a substantial breakthrough.
Since Stenberg could only evaluate Mythos through a third-party report, his conclusions have limitations. While the AI found only one low-severity vulnerability in the heavily audited curl code, this neither confirms the industry hype nor should it lead to a wholesale dismissal of the technology's potential. Current tests suggest that AI-driven vulnerability research holds practical value, but the claims of "revolutionary capability" still seem exaggerated.
Stenberg concluded, "Any project that hasn't yet scanned its source code with AI tools could likely discover a large number of defects through these new-generation tools."
Reference Source:
The world's most “Dangerous” AI, Anthropic's Mythos, found only one flaw in curl