Anthropic's Latest Paper: Internet Anonymity Ends in the AI Era | Hao's Paper Chat

Tencent's cutting-edge technology paper interpretation column, finding AI certainty at the intersection of code and commerce.

By Boyang

Edited by Xu Qingyang

Recently, a confrontation between Silicon Valley's top AI company Anthropic and the U.S. Department of Defense (DoD) has become the focus of the AI community. When announcing cooperation with the Pentagon, Anthropic emphasized two red lines: AI must not be used for offensive weapons, nor for mass domestic surveillance.

However, the Pentagon hopes Anthropic will lift all usage restrictions on its AI model Claude, which conflicts with Anthropic's two red lines.

The core of the Pentagon and Anthropic's conflict is actually a dispute over the boundaries and ethical bottom lines of artificial intelligence technology—a fundamental conflict between "unrestricted use" and "safety guardrails."

From ancient times to the present, whether institutions or commercial entities, in building closed-loop system control, have always followed a clear path: knowing who you are, understanding what you want, and ultimately controlling your behavior and choices.

The first step of this path is confirming who you are. Since the birth of the internet, anonymity has been regarded as its core characteristic. "On the Internet, nobody knows you're a dog." People don different personas in various forums and communities, enjoying freedom of speech unbound by real-world identities.

Of course, we are also accustomed to seeing various doxxing attempts. When an influencer says something wrong, angry netizens mobilize collective wisdom, piecing together the individual's real identity from fragments of information.

However, such traditional de-anonymization methods are time-consuming and labor-intensive, requiring manual comparison of large amounts of information, typically targeting only high-value targets that attract public attention.

Meanwhile, enterprises seeking to understand user preferences for precision advertising rely more on IP addresses, device fingerprints, and cross-application tracking—relatively inexpensive technical methods. But under increasingly strict privacy protection policies, these methods are becoming more and more difficult.

However, the arrival of AI may completely change the rules of this game. Anthropic itself, along with institutions such as ETH Zurich, recently released a heavyweight paper titled "Large-scale Online De-anonymization Using Large Language Models," conducting an extreme test of AI's doxxing capabilities.

Their conclusion is that AI, with its precision and low cost, has essentially pronounced the death sentence of the internet anonymity era.

Therefore, Anthropic's concerns are completely justified. Because you, at any time, can be easily located from your anonymous shell by anyone.

History of De-anonymization: From Handicraft Workshop to Automated Assembly Line

De-anonymization, or doxxing, is essentially a process of peeling an onion. The doxxer first sketches a digital outline from an anonymous person's scattered remarks, then matches this outline against a database of known identities.

Before the AI era, the most famous de-anonymization event in the privacy security field was the 2008 Netflix Prize attack. At that time, Netflix publicly released a batch of anonymized user movie rating data, attempting to offer a reward for optimizing its recommendation algorithm.

Narayanan and Shmatikov discovered that simply comparing these anonymous rating data with public IMDb (Internet Movie Database) accounts, using microdata such as ratings and timestamps for a few obscure movies, could easily restore the real identities of these anonymous users.

So-called microdata refers to individual-level information fragments, such as "gave Twilight 5 stars," "lives in Texas," "never capitalizes the first letter of sentences." A single fragment may not be enough to identify you, but when multiple fragments combine, they form a unique fingerprint. Just as 87% of Americans can be uniquely identified by just three pieces of information—zip code, birth date, and gender—seemingly insignificant details accumulate to become the key to identity.

However, the Netflix attack strategy had a limitation: it required structured data. Movie ratings are neat numerical matrices that can be directly compared for similarity using algorithms.

But the real internet is composed of countless unstructured idle chats, complaints, subjective comments, and colloquial expressions. For example, "Took the dog to Dolores Park yesterday, really miss the rain in Portland" provides abundant information, but traditional algorithms cannot understand it at all. Faced with this unstructured text, the only effective method in the past was to deploy professional investigators who, like detectives, read, analyze, and deduce word by word.

This huge manpower cost constitutes a solid cost moat protecting ordinary people's privacy. This sense of security arising from high investigation costs is called practical obscurity.

The arrival of large language models instantly drained this moat. The core superpower of large models is precisely the deep understanding of human natural language and its underlying complex semantics. It no longer needs uniform tables; it can directly read your remarks on any platform and instantly unravel them.

What used to take human experts hours to complete in logical reasoning and information extraction, large models can now complete in seconds at extremely low computing costs.

Anthropic, which proposed the safety red lines, publicly released interview records of 125 scientists with AI in December 2025, discussing how they use AI tools in their work. These records were partially edited (sensitive information redacted) to protect privacy.

But just weeks after the data release, researchers matched the research topics mentioned in the interviews with published papers using LLMs, finding the real identities of the interviewees. Among 33 scientists who discussed past research, AI successfully identified 9, 50% more than previous methods.

(Anthropic interview de-anonymization method)

And this only takes a few minutes and a few dollars.

AI did not invent new attack logic; it simply popularized this attack to every corner of the internet at an unprecedented scale and extremely low cost.

AI De-anonymization in Four Steps

To verify AI's true capabilities, the research team designed a scalable automated attack pipeline called ESRC. This pipeline breaks de-anonymization into four highly automated steps.

The first step is Extract. The LLM reads all posts and comments from the target user, extracting structured personal profiles. Through semantic-level understanding, the model can infer from a Reddit user saying "This year's CS224N course tortured me, senior year is really tough" that he is a Stanford CS major (because CS224N is Stanford's natural language processing course), approximately 22 years old, and lives in San Francisco. Even if the user never directly says "I am a Stanford student," AI can infer it from the information.

The second step is Search. Faced with massive internet databases with millions of candidates, AI transforms the extracted personal profiles into dense semantic embeddings. The traditional Netflix attack had similar steps, but it could only compare structured features (movie ratings, programming languages, work locations), while LLM embedding vectors can capture richer semantic information, including implicit interests and values. By calculating cosine similarity in multi-dimensional space, the system can identify the top 100 most similar candidates from 89,000 Hacker News users in a very short time.

The third step is Reason. This is the stage where AI shows truly terrifying strength. Traditional algorithms are helpless after calculating similarity, while the AI pipeline invokes large models with strong logical reasoning capabilities (such as GPT 5.2) to conduct deep cross-verification of the dozens of suspects initially screened. The model acts like a judge, examining whether timelines conflict and whether life details are consistent, thereby greatly reducing the possibility of false positives.

The fourth step is Calibrate. To ensure the effectiveness of the attack, AI must learn to judge its "confidence." The system prompts the large model to output an absolute confidence score. If higher confidence is desired, researchers also have AI perform pairwise comparisons of all candidate matches, giving final rankings through a method similar to a sorting competition. This method is particularly effective in large-scale attack scenarios.

In this experiment, designers emphasized mining factual semantic clues while downplaying stylometry—methods that confirm identity by analyzing a person's function word frequency, grammatical structure, and tone characteristics. Because those objective facts rooted in life experience (such as your city, a chronic disease you have, specific niche things you like) are difficult to conceal long-term. Facts are more fatal than writing style and easier for large models to cross-verify.

Of course, this research itself faces a contradiction: to verify attack effectiveness, the correct answer must be known, but knowing the answer means the user was not anonymous enough in the first place.

Researchers adopted two strategies. One is synthetic anonymization: finding users who actively disclose their identities (such as Hacker News accounts with LinkedIn links in their profiles), then using AI to remove all direct identifiers (names, links, GitHub accounts), keeping only semantic information, to see if AI can re-identify them.

The other is time-splitting: cutting a Reddit user's remarks from different periods into two parts to see if AI can match them back together.

Both methods have biases. For example, people who actively disclose their identities may not care about privacy in the first place, and two histories from the same person naturally share similarities. But they at least provide quantifiable benchmarks allowing us to compare the merits of different methods.

More importantly, the entire process uses publicly available models and standard APIs, including Gemini's embedding models, GPT-5.2's reasoning capabilities, and ordinary vector retrieval libraries.

No custom models, no proprietary data, no hacker techniques. Anyone with certain technical capabilities can replicate this attack.

Three Scenarios: Layered and Progressive Threats

To explore the boundaries of AI doxxing, the research team tested three extremely complex scenarios, which layer by layer verified AI's universality.

To measure the power of this pipeline, we first need to understand the definitions of two core metrics: Precision and Recall.

Precision refers to the proportion of correct matches among all matches where AI gives a definitive answer. Recall refers to the proportion of targets that AI successfully matches among all targets that actually have corresponding real identities.

For monitors or attackers, not finding is acceptable, but catching the wrong person is catastrophic. Therefore, all subsequent tests were conducted under harsh conditions where AI maintains extremely high precision of 90% or even 99%.

Cross-Platform Matching

The first scenario is cross-platform matching, the most common doxxing mode. This attempts to bind an anonymous forum persona to real-world professional identities.

The research team collected 987 users who posted LinkedIn links in their Hacker News profiles (providing correct answers for verification), then used AI to remove all direct identifiers, attempting to re-match from 89,000 candidate users.

In this task that seems to span great distances, large models demonstrated overwhelming advantages. At 90% precision settings, AI correctly identified 67% of users. Even when raised to 99% precision (basically no errors), it could still identify 45%. By comparison, traditional Netflix-style attack methods almost completely failed on this task, with only 0.1% recall.

Moreover, AI's advantage becomes more pronounced as the candidate pool expands. When candidates increase from 1,000 to 89,000, traditional methods completely collapse, while AI methods' recall only drops 13 percentage points. Researchers also made bold calculations: if the candidate pool expands to 1 million (close to real platform scale), AI may still maintain 35% recall; even with 100 million candidates, theoretically 27% can still be identified.

Cross-Community Matching

The second scenario is cross-community matching. Testing whether the same person's remarks in different Reddit forums (mainstream movie forum vs. niche movie community) can be associated by AI.

This is a more difficult task because users discuss limited movies in different communities, and overlapping topics may be few.

Researchers had AI look only at movie reviews discussed in both communities—48% of users shared only one movie—then attempted to match 9,781 users.

This time AI's performance was relatively modest: 8.5% recall at 90% precision, dropping to 2.8% at 99% precision.

But this scenario leads to the distinction of information richness: exactly how much information needs to be revealed before a person is locked down by AI? The study found that even minimal amounts of information have no absolute safety threshold. Users who mentioned only 1 shared movie had 3.1% recall; those who mentioned 5-9 had recall jumping to 23%; those who mentioned more than 10 had recall as high as 48%. Your every subjective expression online is constantly increasing the clarity of your digital puzzle, and accelerating exposure non-linearly.

More notably, GPT-5.2 using "high reasoning effort" mode performed twice as well as low reasoning effort (2.8% vs 1.4% at 99% precision), indicating AI's reasoning capabilities are still rapidly evolving, and future models will be stronger.

Cross-Time Matching

The third scenario is even more extreme cross-time matching. People change; a person's focus, residence, and mindset all change over time. Therefore, time is often seen as a natural barrier protecting privacy.

To challenge this barrier, the research team forcibly cut Reddit users' remark history according to a one-year time span, using remarks from one year ago to match accounts from one year later. This tests AI's core understanding capability—it needs to grasp deep, persistent identity characteristics from superficial topic differences.

In this extremely difficult scenario, to ensure extremely low false positive rates, AI used the calibration step mentioned earlier.

In settings of 5,000 test users and 10,000 candidate profiles, calibrated AI achieved 67.3% recall (90% precision) and 38.4% (99% precision).

This is much higher than cross-community matching, indicating that even as topics change, a person's deep characteristics—such as geographic location, professional field, core values, hobbies—remain relatively stable.

The results prove that even spanning a one-year chasm of time, AI can still accurately extract those extremely stable personality traits and microdata, binding the past and present you tightly together.

Researchers also conducted an extreme test: if only one in a thousand queried users actually has a real match in the candidate pool (99.9% are not findable), can AI still work? The answer is yes. Even when the match probability drops to one in ten thousand, AI can still identify approximately 9% of matchable users at 90% precision.

This means that even if the vast majority of people are not in the searched database at all, those who can indeed be identified will still be singled out by AI.

The End of Anonymity

These experimental results have essentially declared the end of the internet anonymity era.

Researchers tested several defense methods, and the results were not optimistic. API access speed can be limited, but attackers can crawl in a distributed manner; automated behavior can be detected, but AI usage patterns resemble highly engaged normal users; differential privacy or k-anonymity techniques can be used, but these are designed for structured data and have limited effect on free text.

Even LLM providers' safety protections are difficult to implement effectively, because each step of this attack pipeline (summarization, embedding, comparison) looks like normal use and is difficult to detect as abuse.

The only effective defense is for platforms to not publicly display users' historical remarks from the beginning.

Then what use is the internet?

You seek medical advice on forums, inevitably describing symptoms and medical history; you ask questions in technical communities, needing to explain what tools you use and what errors you encounter; you share impressions on movie review sites, naturally exposing tastes and viewing habits. This information itself is harmless, yet it is precisely the microdata AI uses to identify you.

And all it takes is a few servers and an LLM pipeline. No need to invade any private devices or communications—just analyzing public information is sufficient to locate specific you from the network. Privacy policies and user agreements are completely ineffective here, because the data is inherently public.

To defend thoroughly, you can only choose to remain silent, which amounts to withdrawing from the collaborative network of modern society; if you choose to speak, you are submitting your ID card to the abyss.

And Anthropic's promise not to use AI for mass domestic surveillance is a commendable stance.

But as this research demonstrates, the capabilities required for surveillance no longer need proprietary models—using publicly available LLMs, standard APIs, and ordinary datasets can achieve capabilities once possessed only by intelligence agencies.

From technical conditions, safety protections, and all other possible aspects, there is no way to prevent you from being doxxed at extremely low cost.

In the AI era, anonymity is dead.

Perhaps in AI's future, the art of escaping control will no longer be hiding in the mountains, but permanently going offline.

Anthropic's Latest Paper: Internet Anonymity Ends in the AI Era | Hao's Paper Chat

Related Articles

分享網址