Who would have thought that this massive market turmoil would unearth a major academic scandal.
This past Friday evening, Google's academic misconduct incident became the focal point of the AI community.
Jianyang Gao, a postdoctoral researcher at ETH Zurich, published an article stating that Google Research's paper, "TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate," contains serious issues regarding its description of the existing RaBitQ vector quantization algorithm, theoretical result comparisons, and experimental comparisons. Furthermore, these issues were explicitly pointed out before the paper's submission but were deliberately ignored by the authors.
As an AI study capable of disrupting the logic of numerous "mainline" companies, TurboQuant's value in the industry seemed unquestionable. Yet, who could have imagined that this ICLR top-tier conference paper, elevated to god-tier status by Google with tens of millions of exposures, has its core technical foundation deeply embroiled in suspicions of "plagiarism."
TurboQuant: The Catalyst for Memory Stock Turbulence
Google's TurboQuant paper has recently exploded beyond AI research circles. Accepted by ICLR 2026, a top global AI research conference, the paper introduces a compression algorithm claiming to reduce the KV cache memory footprint of large language models by at least 6 times, increase speed by up to 8 times, with zero loss in accuracy.
TurboQuant was made public on the preprint platform arXiv in April 2025, accepted by ICLR 2026 in January 2026, and sparked massive attention following its introduction on the Google Research blog on March 24.
Google's promotional post on X garnered over ten million views.
During the inference of large AI models, every time the AI generates a new token, it needs to "review" the conversation history (context). This content is stored in the KV cache. Consequently, the memory occupied by the KV cache often becomes the biggest bottleneck limiting the speed and cost of large models. The extreme lossless compression method proposed by TurboQuant produced shocking results; by significantly reducing the hardware resources required to run large models, it directly冲击 the market's expectation of explosive growth in memory chips.
On the day the Google blog was published, US memory stocks collectively plummeted. SanDisk dropped as much as 6.5%, Seagate Technology fell over 5%, Western Digital dropped more than 4%, and Micron Technology fell 4%. The market capitalization evaporated in a single day exceeded $90 billion.
How exactly did this technology, heavily promoted by Google, achieve this? Simply put, it used a sophisticated method to solve the deadlock of memory consumption.
TurboQuant achieves this goal through a two-stage compression process: The first stage uses "random rotation" and the PolarQuant mechanism to map high-dimensional vectors to polar coordinates, achieving extreme compression. The second stage utilizes the Quantized Johnson-Lindenstrauss (QJL) transform, using only 1 bit of space to correct biases in inner product calculations.
However, it is precisely this part of the technology that became the fuse igniting the academic scandal.
Dr. Jianyang Gao from ETH Zurich cited evidence stating that this "revolutionary" core mechanism promoted by Google was not首创 by Google; his team had fully proposed it two years prior.
Even more infuriating is that Google deliberately "avoided" and "downplayed" prior art in its paper.
RaBitQ Authors Publicly Question: TurboQuant's Core Method Existed Two Years Ago
The RaBitQ series of papers, published in 2024, proposed a high-dimensional vector quantization method and theoretically proved that it achieved the asymptotically optimal error bound given by top-tier theoretical computer science conference papers.
RaBitQ and its extended version were published at the top-tier conferences SIGMOD 2024 and SIGMOD 2025, respectively.
One of the core ideas of RaBitQ is to apply random rotation (random rotation / Johnson-Lindenstrauss transform) to input vectors before quantization, utilizing the properties of the coordinate distribution after rotation to perform vector quantization, theoretically achieving the optimal error bound.
The core of TurboQuant's method is similarly applying random rotation (Johnson-Lindenstrauss transform) to input vectors before quantization. This point was even explicitly described by the TurboQuant authors themselves in their ICLR review response.
However, the TurboQuant paper deliberately avoided any direct connection with RaBitQ in terms of methodology throughout. Instead, it described RaBitQ in the main text as "grid-based PQ" and omitted the core "random rotation" step in RaBitQ, intentionally blurring the lineage between the two.
Majid Daliri, the second author of TurboQuant, had actively contacted Jianyang Gao as early as January 2025, requesting assistance in debugging his self-translated Python reproduction of RaBitQ code. This indicates that the TurboQuant team was well-versed in the technical details of RaBitQ.
Since they already knew and had consulted the original authors, why was there no proper citation or objective comparison in the final paper?
After discovering these issues, Jianyang Gao's team, upholding academic rigor, engaged in multiple private communications with the TurboQuant team via email starting from May 2025, explicitly pointing out the factual errors.
However, the TurboQuant team refused to make corrections, citing that "random rotation has become a standard technique in the field, and it is impossible to cite every method that uses it." Subsequently, this paper was not only pushed to ICLR 2026 but also became a global focus.
If such an academic narrative is not corrected, it will gradually become the consensus. Ultimately, Jianyang Gao's team stepped forward to list several accusations.
Three Specific Accusations
Jianyang Gao listed three specific issues in his article.
First, systematic avoidance of technical similarities.
TurboQuant not only failed to directly discuss the structural connections between the two methods but also moved the originally incomplete description of RaBitQ from the main text to the appendix. This move occurred even after reviewers explicitly pointed out that "RaBitQ and variants are similar to TurboQuant in that they all use random projection" and requested a full discussion.
The TurboQuant authors responded that "the use of random rotation and Johnson-Lindenstrauss transforms is already a standard technique in the field, and it is impossible for us to cite every paper that uses these methods."
Jianyang Gao's team believes this response is shifting the blame: As the specific pioneering work that first combined random rotation (Johnson-Lindenstrauss transform) with vector quantization under the same problem setting and established optimal theoretical guarantees, RaBitQ should be accurately described in the text, and its connection to the TurboQuant method should be fully discussed.
Second, incorrect description of RaBitQ's theoretical results.
The TurboQuant paper characterized RaBitQ's theoretical guarantees as "suboptimal," attributing this to "loose analysis," without providing any derivation, comparison, or evidence.
In fact, Theorem 3.2 in the extended RaBitQ paper (arXiv:2409.09913) has rigorously proved that RaBitQ's error bound achieves the asymptotically optimal error bound given by top-tier theoretical computer science conference papers (Alon-Klartag, FOCS 2017). Due to this result, Jianyang Gao's team was invited to present at a Workshop at FOCS, a top-tier theoretical computer science conference.
In May 2025, Jianyang Gao's team conducted multiple rounds of detailed technical discussions via email with Majid Daliri, the second author of TurboQuant, clarifying this misinterpretation point by point. Majid Daliri also explicitly stated that he had informed all co-authors. However, this erroneous characterization remained uncorrected throughout the entire process of the paper undergoing full review, being accepted, and being massively promoted.
Third, deliberately creating unfair experimental conditions.
When testing RaBitQ's speed, the TurboQuant paper did not use the official open-source C++ implementation. Instead, it used a Python version translated by Majid Daliri himself and restricted RaBitQ to run on a single-core CPU with multi-threading disabled, while TurboQuant itself was tested using an NVIDIA A100 GPU. These two layers of systematic unfair conditions were not explicitly disclosed in the paper.
Majid Daliri himself admitted to the single-core limitation in an email in May 2025, yet the paper still presented the conclusion derived from this—that "RaBitQ is several orders of magnitude slower than TurboQuant"—to readers without any explanation.
Choosing to Speak Out Publicly
Jianyang Gao stated that they discovered TurboQuant's submission to ICLR 2026 as early as November 2025 and immediately contacted the ICLR Program Committee Chairs, but received no response.
After the paper was officially accepted in January 2026, Google began大规模 promoting it through official channels, and related content quickly reached tens of millions of views on social media.
In March 2026, Jianyang Gao's team formally wrote again to all authors of TurboQuant, requesting an explanation and correction. The current response received from the first author, Amir Zandieh, promises to correct issues two and three after the ICLR conference officially ends but refuses to discuss the issue of technical similarity.
Jianyang Gao has published public comments on the ICLR OpenReview platform and submitted a formal complaint containing complete evidence to the ICLR General Chairs, PC Chairs, and Code and Ethics Chairs. He also stated that he would release a detailed technical report on TurboQuant and RaBitQ on arXiv and reserves the option to further report to relevant institutions.
He wrote at the end of his article: "When a paper is pushed to the public by Google with tens of millions of exposures, at this magnitude, the erroneous narrative in the paper does not need active propagation; it only needs to remain uncorrected to automatically become consensus."
Currently, Jianyang Gao and others' claims have received support from many people.
Many people indicate that this is not the first time Google has adopted such practices in AI research.
Perhaps Google and the ICLR official organization need to provide an explanation.
Reference Content:
https://zhuanlan.zhihu.com/p/2020969476166808284
https://x.com/gaoj0017/status/2037532673812443214
https://openreview.net/forum?id=tO3ASKZlok
https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/
© THE END
Please contact our official account for authorization to repost.
For submissions or media inquiries: liyazhou@jiqizhixin.com