Edited by Tu Min
Produced by CSDN (ID: CSDNnews)
After spending five days using Claude Code to rewrite a code library that had been maintained for over a decade, the project maintainer directly changed the open-source license from LGPL to the more permissive MIT.
Recently, the Python classic encoding detection tool chardet has been thrust into the center of public debate because of this.
Adding to the drama, after the library's new version was released, the original author, who had retreated from public view since 2011, suddenly reappeared, demanding that the maintainer immediately revert the license to its original form.
The maintainer, however, insists that the new version was written from scratch using AI and has no relation to the old version.
Thus, a dispute over the ownership and licensing rules of AI-rewritten code has begun.
Original author retreats, maintainer takes over
Simply put, chardet is a widely used text encoding detection library in the Python ecosystem; its core function is to automatically identify the encoding of a byte stream, such as UTF-8, GBK, ISO-8859-1, etc.
It may seem niche, yet it is a fundamental component of many programs. If you have installed Python's requests library, it is likely already running silently on your computer. Earlier statistics show that chardet alone has been downloaded 854 million times in a single year.
The library was first created by developer Mark Pilgrim in 2006 and released under the LGPL license.
Developers familiar with open-source licenses will recognize that LGPL allows modification and distribution, but imposes strict restrictions on secondary distribution and commercial use; derivative works generally must continue under the same license.
After maintaining it for several years, the original author completely withdrew from public view in 2011, and the maintenance of chardet was taken over by others.
Among them, Dan Blanchard is one of the most important maintainers; he has been responsible for every version of chardet since version 1.1 released in July 2012, contributing nearly 700 commits, while the second‑ranked maintainer has only 48 commits.
With the help of Claude, the maintainer completed a full rewrite of the chardet library in five days.
Last week, Dan Blanchard released chardet version 7.0 and announced on the project's GitHub page that this is a 'complete rewrite' released under the MIT license.
He also noted that the library's package name and public API remain unchanged—it can directly replace chardet 5.x/6.x, offering faster speed and higher accuracy. It supports Python 3.10 and above, has no runtime dependencies, and runs on PyPy.
The issue is that the MIT license is far more permissive than LGPL: you are free to use, modify, copy, and distribute the software, including in commercial products, as long as you retain the original author's copyright notice.
As for why he changed the license, Blanchard told foreign media that he had long wanted chardet to enter the Python standard library, but was hindered by the old license, performance, and accuracy, and also limited by time, preventing progress.
"Now Claude enables me to accomplish what I wanted to do in about five days," Blanchard said.
"Therefore, he used Claude Code to rewrite chardet version 7.0 and released it."
Original author’s sudden appearance protests: rejecting the illegal relicensing of the original code
Just two days after the new version was released, a user nicknamed Mark Pilgrim posted on GitHub claiming to be the original author of chardet, thanking long‑term maintainers and contributors, but saying that Blanchard’s release of version 7.0 under the MIT license constitutes an illegal relicensing of LGPL code and directly violates the open‑source agreement.
He clearly opposes this license change.
Below is the full text of his submission on the GitHub issue:
Hello, I am Mark Pilgrim. You may recall some of my classic works such as *Dive Into Python* and the "Universal Character Encoding Detector". I am also the original author of chardet.
First, I want to thank the current maintainer and everyone who has contributed to this project over the years and continually improved it. This is indeed a typical case of successful free‑software development.
However, someone recently reminded me that in the release of version 7.0.0, the maintainer claimed they had the right to "relicense" the project. In fact, they have no such right; doing so is a clear violation of the GNU Lesser General Public License (LGPL).
According to the LGPL, when modified code based on licensed work is distributed, it must continue to be released under the same LGPL license. The maintainer claims this is a "complete rewrite," which is not valid because they have had extensive exposure to the original licensed code (i.e., it is not a "clean‑room" implementation where the developers are completely isolated from the source). Even if a sophisticated code generator were used during development, it would not automatically grant additional licensing rights.
Therefore, I hereby formally demand that they restore the project’s license to its original version.
Whose code is it, really? Who gets to decide?
First, let’s briefly explain what Mark Pilgrim meant by "clean room" in his statement.
Computer engineers and programmers have long relied on reverse engineering to implement program functionality without directly copying copyrighted source code. Simply put, it is a way to imitate software behavior and functionality without infringing copyright. In the past, this practice usually followed the so‑called "clean‑room" principle: people who have had no contact with the original source code re‑implement the functionality, ensuring that the newly generated code does not constitute a derivative work of the original.
Blanchard admitted in his response that he has maintained chardet for over ten years, and indeed has had long‑term exposure to the original codebase.
The traditional clean‑room approach requires a strict separation of two groups: one group that understands the original implementation, and another group that writes the new implementation, with complete isolation between them.
Objectively speaking, in this project Blanchard does not satisfy the clean‑room isolation requirement.
However, he argues that the clean‑room method is merely a means to an end; its purpose is to ensure that the final code is not a derivative work of the original. In other words, clean‑room is a way to achieve the goal, but it is not the goal itself.
In this case, he can demonstrate through direct technical measurement that the result meets the same objective—the new code is structurally independent of the old code, rather than merely relying on assurances from the development process.
Based on this, he used the JPlag code‑similarity tool to provide data: the files of chardet 7.0 show a maximum similarity of only 1.29% compared to the corresponding files in version 6.0, whereas some files between versions 5.2 and 6.0 have similarity as high as 80%.
Blanchard stresses that he created the new codebase from scratch, without directly copying any old files.
If merely having once seen the original code were enough to invalidate a rewrite, then for any LGPL‑licensed project maintainer, attempting to re‑implement the same functionality under a different license in the future would become practically impossible—regardless of how different the new code might be from the original.
I do not believe that is what the LGPL requires, but I am open to other interpretations. In my view, the core question is whether the new code derives from the old code (i.e., is a derivative work). Based on the evidence presented so far, it does not.
How AI Was Involved
To maintain full transparency, Blanchard further shared the detailed process of this rewrite:
I used Claude’s ‘superpowers brainstorming’ capability to generate a design document that thoroughly outlines the architecture and implementation approach I wanted to adopt.
This design is based on a set of requirements I established for this rewrite (these were originally written in my phone’s Notes app and never committed to the repository, but I list them here as background):
Maintain compatibility with the external API
The project should still be called chardet, because the plan is to replace the original chardet with the new implementation
Do not base the work on any GPL or LGPL‑licensed code
Maintain chardet‑level encoding detection accuracy on the test data
Language detection is not a strict requirement, but if it is easy to implement or a side effect of other design choices, it may be included
High performance and memory efficiency: able to effectively utilize multi‑core CPUs
No runtime dependencies
Must support both PyPy and CPython
The design should be clean and modern
If a trained statistical model is used, the data source should be obtained via Hugging Face’s load_dataset API
Any training code should cache data locally so that it can be frequently retrained during development
Perform performance benchmarks frequently
Avoid using large dictionaries as literals, because importing such structures in CPython 3.12 is very slow
Afterward, Blanchard said he began development in a completely empty repository, with no access to the old codebase, and explicitly instructed Claude: do not base the implementation on any LGPL or GPL‑licensed code.
Next, he personally used Claude to review, test, and iteratively improve each part of the generated code.
Blanchard also admits that he did not write every line of code by hand, but throughout the process he was deeply involved in architectural design, code review, and each step of iterative improvement.
"I understand that this is a new and somewhat unfamiliar territory: using AI tools in the rewrite of a long‑standing open‑source project does raise legitimate questions. However, based on the evidence available, it is clear that version 7.0 is an independent work, not a derivative of the LGPL codebase, so applying the MIT license is justified."
Controversy point: the boundary of AI‑generated code is hard to define
Despite Blanchard’s effort to independently generate the code, several complicating factors remain.
First, netizens discovered that, when Claude was rewriting chardet version 7.0, it explicitly used some metadata files from early versions of chardet, leading developers to question whether this new version is truly a derivative work.
On the other hand, the Claude model absorbed a large amount of public web data during training, which may have included early chardet open‑source code. Whether this means that AI‑generated code counts as a derivative of the original remains controversial.
Moreover, there are human factors. Although the new version’s code was generated by Claude, Blanchard said he ‘used Claude to review, test, and iterate on each part of the result… I did not write the code by hand, but I was deeply involved in the design, review, and iteration of every step.’ Having someone very familiar with the early chardet code so deeply involved in reviewing the new code could also affect whether this version can be considered a completely new project.
Furthermore, all of Blanchard’s operations were carried out within the chardet library’s same package name, same repository, and same PyPI listing; more importantly, the new version is still named chardet.
Netizens’ opinions
This incident has sparked extensive discussion in the open‑source community, pointing to a gap in the fundamental rules of the AI era.
Some have defended Blanchard against the accusations he faces:
Blanchard maintains this library alone, with no funding, no collaborators, and no support. The other two members of the chardet team stopped contributing as late as 2017, and one of them has made no commits since 2012. The original author wiped his Internet presence clean in 2011. This is one of the packages the Python ecosystem relies on most, sustained solely by one person’s spare‑time efforts. Now that this person has done something unpopular, everyone suddenly weighs in on governance, hosting, and the spirit of free software.
Also, user Armin Ronacher wrote an article titled *AI and the Ship of Theseus*. He views AI rewriting as a way to finally escape the GPL — he believes GPL restricts sharing:
"If you discard all the code and start from zero, even if the end behavior is identical, it is still a new ship."
However, many netizens think:
"Feeding copyleft‑licensed code into a model trained on it, letting the model produce functionally equivalent output, then pointing to the output and saying ‘look, no similarity.’ Plagiarism checkers finding no matching tokens does not prove independence; it only shows that laundering works. If this tactic were legal, every existing copyleft project could be turned into MIT‑licensed (or even closed‑source) simply by running Claude once — the approach works both ways."
In the GitHub discussion, someone even more sharply commented: taking leaked Windows source code, feeding it to a large model for rewriting, and then releasing it as open source — would that be acceptable? If not, explain why chardet is different. The mechanism is exactly the same; the only variable is whether you sympathize with the rights holder.
Zoë Kooyman, executive director of the Free Software Foundation (FSF), stated bluntly: "AI models absorb the code they are supposed to reimplement, so there is truly no such thing as a ‘clean’ implementation."
On one side is the baseline of classic open‑source licenses; on the other side is the new reality of AI‑assisted development. After the original author disappeared and a single person maintained the project for ten years, who does the project belong to? Who ultimately decides the license of the new chardet version, and what do you think?
References:
https://github.com/chardet/chardet
https://github.com/chardet/chardet/issues/327#issuecomment-4005195078
https://shiftmag.dev/license-laundering-and-the-death-of-clean-room-8528/
https://www.theregister.com/2026/03/06/ai_kills_software_licensing/