Adoption Rate from 7.9% to 54%: The Three-Stage Evolution of Kuaishou's Intelligent Code Review

Produced by | Kuaishou R&D Efficiency Committee x R&D Efficiency Center

Overview

In the process of Kuaishou comprehensively advancing the "AI R&D paradigm upgrade," how to convert the efficiency improvement brought by individual AI use into the overall delivery efficiency of the organization is a core proposition that must be overcome. The intelligent code review system is the key link to solving this proposition—it acts after individual development is complete and before team delivery and merging. Through quality filtering and knowledge precipitation, it elevates AI capabilities from a "personal assistant" to a "team collaborator."

In traditional R&D processes, Code Review (CR) often faces structural pain points such as high manual review costs, inconsistent standards, and difficulty in deepening review depth. This article summarizes the evolution of Kuaishou's intelligent code review system through three generations of architecture: from "Pure LLM Heuristic" to "Knowledge Engine + Rule Determinism," and finally to "Agentic Autonomous Decision-Making."

Aiming at the core stubborn disease of AI "hallucinations," the Kuaishou technical team built a context engine and precipitated 1,100+ deterministic rules, supplemented by a three-layer filtering and BadCase negative example library. This increased the review adoption rate from 7.9% to 54% and significantly shortened the 80th percentile of MR review duration by 9.9%. Through "knowledge engineering," the accuracy of large models was improved, making intelligent review a truly trustworthy productivity tool. It also provides a reusable practical sample for how to achieve the transmission from "individual efficiency improvement to organizational efficiency improvement" in the AI era.

Background

1.1 The Trade-off Between Efficiency and Quality: Inherent Pain Points of Traditional CR Modes

Under Kuaishou's high-speed iteration business scenarios, the scale of code changes is growing daily. The traditional manual code review mode is gradually becoming overwhelmed, and its inherent three major pain points are becoming increasingly prominent:

Low efficiency, response depends on "luck"

When the volume of code changes is large, the cost of manual CR increases significantly. The review cycle is prolonged, and response timeliness entirely depends on how busy the reviewer is. It has become normal for an MR in a core repository to queue for 48 hours, seriously affecting the overall speed of business delivery.

Low-level errors, hard to guard against

Human cognitive attention is limited. When facing complex logic, reviewers often subconsciously ignore seemingly simple low-level errors. For example, if((a || b) && c) is mistakenly changed to if(a || c) when deleting condition b, or obvious assignment errors like a.setId(a.getId()) appear. However, data shows that these "low-level mistakes" account for a fairly high proportion of online failures caused by code changes.

Unstable effects, difficult knowledge inheritance

The quality of reviews relies heavily on the reviewer's experience and cognitive load. The valuable experience and tacit knowledge of senior engineers are difficult to systematically precipitate into team consensus, while junior developers are prone to missing deep-seated defects due to lack of experience. This instability leads to fluctuating code quality, increasing risks in the production environment and hindering the effective inheritance of knowledge within the team.

1.2 The Introduction of AI and New Challenges: How to Break the "Trust Crisis"

Facing the predicament of traditional CR, intelligence is seen as the inevitable way out. However, when we first introduced large models for code review, we encountered a typical AI trust crisis, mainly manifested in the following aspects:

"Hallucination" problem

AI makes inferences based on one-sided code fragments, easily producing a large amount of inaccurate or incorrect false positives, reducing the reliability of review results.

Low-value suggestions

The comments generated by AI contain a large number of generic, unsubstantial suggestions, such as "suggest optimization" or "consider," causing developer fatigue and reducing the willingness to adopt.

Missing context

Only analyzing Git Diff fragments, unable to understand the complete role of the code in the system.

Uncontrollable knowledge

The "black box" characteristic of large models makes the precipitation and inheritance of review standards difficult, making it hard to form unified team norms.

These issues led to low quality of AI review comments. Developers generally had resistance, and the adoption rate lingered below 10% for a long time, seriously affecting the promotion and application of the intelligent CR tool.

Results

The results achieved by Kuaishou Intelligent CR are as follows:

Steady improvement in comment quality

The comment adoption rate continued to increase, rising steadily from less than 10% in the Intelligent CR1.0 stage to stably at 50%+ in the current stage.

Helping improve review efficiency

In the full year of 2025 relative to 2024, the 80th percentile of Kuaishou MR review duration dropped by 9.9%, falling from 7.07 hours to 6.37 hours. Especially starting from June 2025, the review duration showed a significant downward trend, and the timeline highly coincides with the time when Intelligent CR was promoted and implemented on a large scale at the company level.

Coverage scale and depth

Intelligent CR has achieved company-level implementation at Kuaishou, with MR coverage reaching 74%. It supports mainstream languages such as Java, C/C++, JS/TS, and GO, and has precipitated 1,100+ CR rules.

Practice 1: Core Technical Implementation Evolution from Heuristic to a Fusion of Determinism and Autonomy

In the process of building the intelligent CR system, Kuaishou has always upheld a core concept: tools must not only be "usable" but also become "trustworthy" intelligent partners for developers. The entire intelligent CR system has gone through three key evolutionary stages: from the initial "Pure LLM Heuristic Review" to "Knowledge Engine Driven Deterministic Review," and then to the latest "Agentic Autonomous Decision-Making Review." Each evolution is a systematic solution to the core problems of the previous stage, continuously driving the system towards higher credibility and practicality. Ultimately, a high-trust intelligent CR framework was built, achieving a leap from basic functions to high-trust review capabilities.

3.1 Intelligent CR1.0: Pure LLM Heuristic Review Exploration Period

To solve the pain points of manual CR, Kuaishou began introducing LLMs to explore intelligent CR in Q2 2024. The overall design adopted a three-step serial pipeline of "Coarse Screening + Fine Review + Filtering."

3.1.1 Core Process

Step 1: CR Necessity Judgment

Input: MR Diff fragment

Output: Boolean value (true/false)

Purpose: Filter out MRs that do not need review to reduce costs

Step 2: Review Generation

LLM performs heuristic review based on fuzzy task descriptions

Check items: Syntax/logic errors, performance issues, security vulnerabilities, readability, naming conventions, etc.

Keyword detection: XSS-related APIs (dangerouslySetInnerHTML, v-html, innerHTML, etc.), base64 encoded strings

Step 3: Comment Filtering

Delete low-value comments based on rules: Comments/copyright suggestions, undefined object problems, CSS problems, invalid line numbers, etc.

3.1.2 Core Problems

Severely insufficient context

Only provides MR Diff fragments, lacking complete method bodies and cross-file dependencies

The model cannot understand the complete role of the code in the system, leading to a high misjudgment rate

Lack of knowledge system, poor recall stability

No rule library support, entirely relying on the model's "black box" capability

Team knowledge cannot be precipitated and inherited

Inconsistent review standards and large quality fluctuations

Task definition is too broad

Instructions given to the model lack Know-how, overly relying on the model's autonomous understanding

Check item descriptions are vague (e.g., "performance issues," "security issues")

Lack of specific scenarios and examples, making it difficult for the model to accurately grasp the focus of review

Limited optimization means

Lacks actionable optimization paths for BadCases

Unable to systematically improve review quality

3.1.3 Stage Summary

Data performance: Comment adoption rate was 7.9%, with poor comment quality, many false positives, and flooded worthless suggestions

Technical validation: Proved the technical feasibility of AI code review but exposed the fundamental flaws of the pure LLM-driven mode

Evolution direction: Build an engineered context construction mechanism, establish a domain knowledge system (rule library, best practices), and design a multi-level quality assurance mechanism

3.2 Intelligent CR2.0: Knowledge Engine Driven Three-Stage Deterministic Review

Aiming at the pain points of insufficient context, lack of knowledge, unstable quality, and high tuning costs in the Intelligent CR1.0 stage, we conducted an in-depth analysis of the problems and abandoned the "Pure LLM Driven" idea. Combining domain knowledge and engineering links, we built a deterministic review framework of "Context Engineering + Rule Driven Review + Comment Value Assessment and BadCase Interception Guarantee." By providing high-determinism knowledge through rules, delimiting the scope and focus of review for the LLM, and then combining the LLM's powerful code understanding and logical reasoning capabilities, the complementarity and fusion of the two ensure the accuracy and interpretability of review results, achieving a leap from basic functions to high-trust review capabilities.

3.2.1 Context Intelligent Construction Engine

To solve the pain point of "misjudgment caused by missing context," we built multi-level context understanding capabilities aimed at comprehensively and accurately understanding the "full picture" of code changes. From raw Git Diff to rich context Prompt generation, it includes the following key capabilities:

Language intelligent identification: Automatically identifies 10+ programming languages including Java, JS/TS, Go, etc., adapting differentiated analysis strategies

Method body extension: Breaks through Git Diff fragment limitations, expanding from diff lines to complete method bodies to avoid taking things out of context

AST deep parsing: Understands code structure semantics and control flow through Abstract Syntax Trees

Cross-file dependency inference: Identifies calling relationships between interfaces and methods to assess the potential impact scope of changes

Diff intelligent splitting: Splits large change sets into logically independent review units, breaking through token window limitations and improving review focus

PRD requirement association: Further enriches context information by associating requirement documents (PRD), ensuring review suggestions align with business goals

3.2.2 Map-Reduce Long Context Processing

During the advancement of Intelligent CR implementation, problems of missed recalls or even exceeding the model context window appeared due to large MRs or large rule sets causing the model's lack of concentration. We adopted distributed processing logic similar to Map-Reduce, decomposing the long context problem into three stages: Split → Process → Merge.

Stage 1: Map Stage (Splitting)

Intelligently splits large Diffs according to logical independence

File-level grouping: Preliminary grouping by modified files

Functional block identification: Uses AST analysis to identify related logical units such as functions, classes, and modules

Dependency relationship analysis: Analyzes dependencies between different blocks to ensure splitting does not destroy logical integrity

Size balancing: Ensures the size of each split Diff is within the LLM's processable range

Stage 2: Process Stage (Processing)

Performs context fusion and comment generation for each split unit, with the rule engine and LLM working together

Stage 3: Reduce Stage (Merging)

Integrates comments from multiple units into high-quality output through classification, similarity calculation, clustering merging, and priority sorting

3.2.3 Value Assessment and Filtering System: From "Nonsense" to "Golden Sentences"

To screen out truly valuable "golden sentences" from massive comments, Kuaishou established a three-layer value assessment and filtering system.

First Layer: Basic Noise Filtering

Generic nonsense pattern recognition: Automatically filters comments without substantive content such as "suggest optimization" and "consider"

Duplicate suggestion merging: Intelligently aggregates similar comments to avoid information overload

Obvious misjudgment interception: Modification suggestions consistent with source code, etc.

Second Layer: Accuracy Verification

Ensure every comment possesses sufficient evidence and rationality

Evidence sufficiency assessment: Comments must have specific and reasonable code locations and problem descriptions

Rule matching verification: Check if comments are triggered based on valid rules

Context consistency: Ensure suggestions are reasonable in the current code context

Third Layer: Value Density Assessment

Focus on truly important issues

Problem severity grading: Based on rule levels and context evidence, classifies problems into different levels such as P0 (Blocking), P1 (Severe), P2 (Major)

Impact scope and developer friendliness assessment: Prioritize presenting high-priority suggestions; dynamically adjust the limit on the number of comments output in a single review based on change size to avoid review fatigue; provide specific modification suggestions or sample code for every comment as much as possible

3.2.4 RAG Enhanced BadCase Intelligent Interception

After introducing the value assessment and filtering system, the quality of outgoing comments improved significantly. However, in the daily analysis of BadCases, we identified that some misjudgment comments were still leaking out. Aiming at the AI "hallucination" problem, we established an active defense mechanism. The LLM evaluates whether the current comment belongs to the same type of misjudgment as historical BadCases, analyzes the root cause of the misjudgment (context misunderstanding, overly strict rules, unsuitable scenarios, etc.), and dynamically adjusts review strategies to avoid repeated occurrences of the same type of hallucination misjudgment.

Historical BadCase Vector Library Construction

Construct a BadCase vector database of verified misjudgment cases

Collect all misjudgment cases from user feedback

Manually verify and annotate misjudgment types and causes

Extract semantic features of BadCases

Use multi-dimensional Embedding to build semantic indexes

Vectorize the problem description, code context, and comment content of BadCases

Build high-dimensional semantic space indexes

Support fast similarity retrieval

Real-time BadCase Detection Process

Real-time similarity retrieval

Every generated comment undergoes a BadCase matching check

Calculate the vector similarity between the comment and historical BadCases

Identify potential similar misjudgments

LLM self-check

When similarity exceeds a threshold, trigger LLM self-check

LLM evaluates whether the current comment belongs to the same type of misjudgment as historical BadCases

Determine if the same error pattern exists

Root cause analysis

Analyze the root cause of the misjudgment

Context understanding error: Misjudgment caused by missing key context

Rule application too strict: Rules are not applicable in specific scenarios

Scenario recognition failure: Failed to correctly identify the code usage scenario

Record analysis results for subsequent optimization

Strategy dynamic adjustment

Adjust review strategies based on root cause analysis results

Apply corrected review logic to similar scenarios

Avoid repeated occurrences of the same type of hallucination misjudgment

3.2.5 Stage Summary

By systematically introducing context engineering, a rule knowledge system, and a multi-layer quality assurance mechanism, the CR2.0 architecture fundamentally solved the core problems of CR1.0, elevating intelligent CR from "usable" to "useful." It achieved a leap in comment adoption rate from 7.9% to 54% and rebuilt developer trust.

3.3 Intelligent CR3.0: Agentic Autonomous Decision-Making Review

Although Intelligent CR2.0 has significantly improved review results (comprehensive adoption rate reached 54%), it exposed three levels of limitations in practical application, restricting the system's evolution to a higher level.

(1) Limited flexibility in context acquisition

Context acquisition uses engineered predefined strategies and lacks runtime adaptive capabilities. This leads to two extreme problems: on one hand, "over-acquisition" easily occurs, bringing a large amount of irrelevant code into the context, causing token waste and interfering with model attention; on the other hand, "insufficient acquisition" may occur, where predefined strategies cannot dynamically expand the context scope for complex scenarios requiring deep analysis, limiting analysis depth.

(2) Boundaries of deep defect identification capabilities

For complex scenarios such as cross-service dependency analysis, long call chain tracking, and architecture-level risk identification, predefined processes are difficult to fully cover. The system lacks autonomous exploration capabilities and cannot dynamically adjust analysis strategies based on problem characteristics, resulting in obvious shortcomings when dealing with large-scale refactoring, architecture changes, and other complex scenarios.

(3) Lack of creative review

The rule-driven mode guarantees determinism and stability but lacks creativity. The system struggles to produce insightful suggestions that "brighten developers' eyes." Review depth mainly stays at the code level, lacking deep understanding of architectural design and business intent.

To this end, we continue to advance the system's evolution towards the Agentic mode. The core idea is "fast where it should be fast, deep where it should be deep," building a fusion framework of determinism and creativity featuring "Autonomous Planning + Dual Path Parallel + Agentic Intelligent Agents."

For standard scenarios: Maintain the efficient and stable deterministic process of CR2.0

For complex scenarios: Introduce Agentic autonomous decision-making capabilities to achieve a fundamental transformation from passive checking to active collaboration

3.3.1 MetaData Driven Agentic Base

To reduce the cost of building Agentic applications for team AI scenarios, we built a Metadata-driven declaratively extensible AI Agent base. It aims to support the rapid construction of domain-specific AI Agent applications through unified underlying capabilities and a declarative extension mechanism.

Application layer: By injecting business rules, declarative configuration tools, and prompts, complete Agent functionality can be achieved.

Agentic base layer (Core): Provides five core capabilities—AgentExecutor (declarative execution engine), Tool Orchestration (tool call orchestration), Skills Manager (dynamic skill management), History Manager (history management), Context Manager (context management). All capabilities are implemented through a METADATA dynamically driven engine, supporting runtime dynamic loading and assembly.

Declarative extension layer: Supports hot-pluggable capability registration through four extension mechanisms: MCP Protocol, Skills, Tools, and System Prompt. The declarative design based on FunctionDeclaration and AgentDefinition makes extension development and integration simple and efficient.

Infrastructure layer: Provides basic capabilities such as sandbox isolation, local operations, and statelessness, ensuring the security, reliability, and scalability of Agents.

3.3.2 Planning Intelligent Planning Layer

As a global routing decision-maker at the MR level, the Planning layer is responsible for analyzing change characteristics and intelligently deciding on review paths.

Analysis dimensions

Change complexity quantification

Code change scale: number of modified files, lines of code, change density

Impact scope identification: cross-file/module dependencies, core interface changes

Logical complexity: cyclomatic complexity, nesting levels, number of branches

Rule feature identification

Rule type classification: standard general rules vs. special rules requiring deep understanding

Context requirement assessment: whether predefined context is sufficient, whether dynamic expansion is needed

Execution complexity estimation: computational cost, multi-round reasoning requirements, external tool dependencies

Routing strategies

CR2.0 version three-stage architecture: Simple MR + Standard Rules → Efficient, stable, low cost

Agentic path: Complex MR + Special Rules → Deep analysis, flexible response

3.3.3 Dual-Path Parallel Architecture

Path 1: CR2.0 Deterministic Path

Adopts the mature CR2.0 architecture, applicable to standard review scenarios. The processing flow is: Predefined context construction → Rule matching and LLM reasoning → Three-layer value filtering → BadCase interception.

Core advantages

High throughput: Predefined process, average processing time < 1 minute

High stability: Based on mature architecture, quality baseline is guaranteed

Low cost: Token consumption is controllable

Path 2: Agentic Autonomous Decision-Making Path

Core innovation of CR3.0, applicable to complex/deep analysis scenarios.

Core advantages

Autonomous context acquisition: Adopts the ReAct thinking-action loop mechanism to achieve autonomous planning and acquisition of context, ensuring sufficient basis to support comment output.

Agentic comment generation: Based on sufficient context information, Agentic comment generation possesses three major deep capabilities:

Deep logical reasoning: Complete system understanding, potential impact analysis, architectural level insights

Cross-boundary analysis: Call chain tracking, cross-service consistency checks, system-level risk identification

Business intent understanding: Requirement alignment checks, business logic verification, user experience assessment

3.3.4 Agentic Comment Evaluation Agent

CR3.0 upgrades the comment evaluation link to an Agentic intelligent agent, achieving a leap from rule evaluation to intelligent evaluation.

Technical implementation:

Deep semantic understanding: Deep understanding of comment value, developer need insights, context-sensitive judgment

Dynamic quality standards: Scene adaptation, team specification fitting, value density optimization

BadCase interception: Retains CR2.0's RAG vector library mechanism, providing double assurance (Agentic intelligent evaluation + BadCase vector library interception)

3.3.5 Stage Summary

In the Intelligent CR3.0 stage, a dual-path mode is adopted. For standard scenarios, it maintains the high efficiency and stability of CR2.0 with predefined processes and controllable costs. For complex scenarios, it provides deep analysis capabilities, autonomously acquires context, and discovers system-level and architecture-level issues. It possesses both determinism assurance and creative breakthroughs, achieving "fast where it should be fast, deep where it should be deep."

Practice 2: Knowledge Self-Evolution - A Knowledge Base That Gets Smarter with Use

4.1 R&D Knowledge Self-Evolution: Converting Tacit Experience into Executable Review Rules

Traditional code review relies heavily on the personal experience of engineers, and this "tacit knowledge" is difficult to scale and pass on. Through a systematic approach, we built a four-layer rule system, converting team wisdom into reusable review capabilities. Through business scenario-driven code review, tacit knowledge is made explicit, constructing a quadruple-dimensional rule source system:

Failure retrospection rules: Analyze the root causes of company online issues in the past two years to extract preventive check rules (e.g., null pointer protection, resource leak checks).

Pitfall prevention guide and best practice standardization: Solidify and precipitate typical "pitfalls," coding standards, and performance optimization experiences accumulated by the team in practice into rules. Convert the experience of "post-disaster firefighting" into "pre-event prevention" check rules. Covers key dimensions such as performance optimization (e.g., database queries, cache usage, concurrency control) and security protection (e.g., input validation, permission checks).

Historical review knowledge mining: Use AI to assist in analyzing historical manual review comments, identifying the focus points and judgment logic of senior engineers, extracting high-frequency problem patterns, and mining "hidden rules": quality standards that are conventionally agreed upon by the team but not documented.

Business customized rules: Support various business teams in configuring exclusive check items based on their own characteristics to achieve fine-grained management of vertical scenarios.

Through this system, Kuaishou has systematized fragmented review experience, forming a rule library covering multiple dimensions such as code quality, performance, security, maintainability, and best practices. The cumulative number of rules exceeds 1,200. This has significantly improved the overall code quality and stability of the team.

4.2 Implementing Self-Evolution: Building a Data-Driven Continuous Improvement Loop

The key reason why Kuaishou's intelligent CR system can continuously improve and maintain a high adoption rate in the long term lies in its data-driven self-evolution loop. This mechanism ensures that the system can continuously learn and optimize based on actual feedback, achieving "getting more accurate and smarter with use."

Intelligent feedback collection: Developer behaviors such as acceptance, rejection, and modification of each comment are automatically recorded as valuable data for system learning.

Deep root cause analysis: Analyze BadCases weekly to identify weak links in the system and iteratively improve them.

Progressive optimization: Small steps and fast runs. Each optimization undergoes strict A/B testing to ensure that every iteration is robust and effective.

Continuous knowledge precipitation: Solidify verified patterns into new rules, constantly enriching and updating the team's knowledge base, achieving the self-evolution of the rules themselves.

Practice 3: Product and Operation Strategy - From Precise Pilot to Large-Scale Implementation

Problems solved:

How to enable developers to access it seamlessly and lower the barrier to use

How to adapt to the differentiated needs of different teams and repositories

How to go from pilot to large-scale promotion and establish user trust

How to extend from "incremental gatekeeping" to "full-link quality escorting"

5.1 Seamless R&D Experience: "Slide" Integration

The value of a tool lies in its use. We are committed to seamlessly integrating intelligent CR into developers' daily work, achieving "silent improvement."

Scenario integration:

Automatically trigger full or incremental reviews when developers create or update MRs.

Through IDE plugins, shift review capabilities to the coding stage to achieve immediate feedback.

Support custom trigger strategies based on information such as branch merge direction and MR tags.

Review rule adaptation:

Support rule customization at team and repository dimensions to meet the differentiated needs of different businesses.

Addressing the demand for fine-grained review specifications in large warehouses, multi-team collaborative warehouses, and multi-tech-stack warehouses, we have established rules based on repository paths, code file development languages, MR initiators, etc.

5.2 Progressive Promotion Strategy

In the implementation process of Kuaishou Intelligent CR, a progressive promotion method was adopted. Starting from precise pilots, trust was established through iterative optimization, influence was expanded through word-of-mouth effects, and finally comprehensive large-scale coverage was achieved.

Precise pilot: Select teams with high acceptance of new technologies as pilots, concentrating resources to create success stories and laying the foundation for subsequent promotion.

Iterative optimization: Continuously optimize product functions and experience based on feedback from pilot teams, establishing user trust and dependence on the product.

Word-of-mouth promotion: Utilize the demonstration effect of successful benchmarks to attract other teams to actively integrate, forming a spontaneous promotion momentum.

Comprehensive scaling: Formulate differentiated strategies based on the characteristics of different teams, ultimately achieving full coverage within the company.

Finally, through the above strategies, 74% MR Intelligent CR coverage was achieved within Kuaishou.

5.3 Open Co-Creation Ecosystem: Capability Boundary Expansion

To expand the boundary of intelligent review capabilities, we have deeply collaborated with various business teams to jointly create three major co-creation capabilities covering stock governance, business logic, and security specialization, driving intelligent CR from "incremental gatekeeping" to "full-link quality escorting."

5.3.1 Co-Creation Capability 1: Repository-Level Intelligent Scanning — "Deep Physical Examination" of Stock Code

Background:

High proportion of online issues caused by stock code: Among online issues caused by code problems in the business department since 2024, 32% originated from stock code problems.

High cost of discovering stock issues: High cost and difficulty in promoting governance of stock issues have become potential hidden dangers for business stability.

Therefore, we co-built repository-level intelligent scanning capabilities with the Life Service department, aiming to achieve systematic hazard mining and closed-loop repair at a lower cost.

Construction content:

Full code analysis: Use AI to perform deep scans on specified code repositories to quickly identify coding defects, performance bottlenecks, and online hazards.

Problem grading and distribution: Grade the severity of discovered problems, generate scan reports, and assign them to corresponding owners to promote problem tracking and repair closed-loops.

Results and challenges:

In pilot business repositories, the scanning accuracy rate was 75%. Examples of scanned problems.

Scan

Business screenshot

Explanation: The prompt reminds the user that they can upload a 10M file, but the user only discovers the upload limit is 2M when uploading. Seriously affects user usage.

5.3.2 Co-Creation Capability 2: Business Logic Review — From "Code Correctness" to "Business Correctness"

Background: General intelligent CR is difficult to judge whether code aligns with business requirements and cannot identify business-type defects. Therefore, we co-built business logic intelligent review with the commercialization team, aiming to achieve the leap from "code review" to "business logic proofreading."

Construction content:

Requirement document parsing: Automatically associate code changes with corresponding requirement documents (PRD), use AI to extract key function points, and convert them into structured context.

Requirement-code comparison: Using standardized requirement descriptions as a benchmark, intelligent review tasks compare code implementations to identify inconsistencies, omissions, or errors at the business logic level.

Scenario-based hints: For identified deviations, generate specific, actionable business logic suggestions to assist developers in confirming that implementation aligns with requirements.

Results and challenges: This capability has been piloted in some businesses, preliminarily verifying the feasibility of the technical path. The current core challenge lies in the deep understanding and function decomposition of non-standardized requirement documents. Future focus will be on optimizing requirement understanding capabilities and strengthening integration with business scenarios.

Comment example:

5.3.3 Co-Building Security CR: Shifting Security Left to Early Development Stage

Background: To implement the "Shift Security Left" philosophy, Kuaishou collaborated with the security team to integrate professional security detection capabilities into the Code Review stage, striving to discover and fix security vulnerabilities early before code is committed.

Construction content:

Security detection Agent integration: Access the professional detection capabilities provided by the security team, integrating them into the intelligent review process in the form of Agents to identify common security issues such as injection, leakage, and privilege escalation.

Review suggestion fusion: Present detected security vulnerabilities directly to developers in the form of code review comments, providing repair suggestions and basis.

Results and challenges: During the pilot period, the system helped developers identify and confirm about 200 valid security vulnerabilities. The current adoption rate is about 25%, reflecting that there is still room for optimization in vulnerability detection accuracy. Subsequent efforts will continue to collaborate to focus on improving precision and developer acceptance.

Comment example:

Outlook

Kuaishou Intelligent CR's goal is not limited to discovering code-level issues. In the future, it will also commit to work in the following directions:

Capability expansion: Achieve long-chain defect identification across code repositories, understand all code changes across services and repositories triggered by requirement changes, and identify potential chain-level risks.

Automatic repair: For high-confidence issues, provide "one-click repair" capabilities, freeing developers from repetitive repair work.

Ultimately, Kuaishou hopes to build Intelligent CR into an "Architecture Consultant" that can provide forward-looking architecture suggestions and risk warnings during the requirement design and coding stages, further improving R&D efficiency and system quality.

Adoption Rate from 7.9% to 54%: The Three-Stage Evolution of Kuaishou's Intelligent Code Review

Related Articles

分享網址