NUS, Fudan, and Tsinghua: The First Systematic Survey on Large Model Latent Spaces

Moving from human-readable discrete symbol spaces to machine-native continuous latent spaces, large model design is undergoing a disruptive reconstruction.

Recently, top academic institutions including the National University of Singapore, Fudan University, Tsinghua University, and Zhejiang University jointly released the first systematic panoramic survey on the field of large model latent spaces. Attempting to deconstruct the underlying logic, technical pathways, and future prospects of the latent space paradigm (the "true brain" of LLMs) through five progressive perspectives—"Foundation, Evolution, Mechanism, Ability, and Outlook"—this work fills the gap of fragmented research in this domain.

Overview of the systematic survey on large model latent spaces

Paper Title: The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook
Paper Address: https://arxiv.org/pdf/2604.02029
GitHub Repository: https://github.com/YU-deep/Awesome-Latent-Space

Diagram illustrating the shift from explicit to latent space

Comparison framework of latent and explicit spaces

1. Foundation: What is the "Latent Space" of Large Models?

The large model latent space is a continuous, non-discrete representation space formed internally through learning. It encodes implicit semantics, grammar, and contextual associations behind text and multimodal information that are not explicitly expressed by tokens. It is a machine-native computational space. Currently, mainstream large models rely heavily on explicit space (linguistic symbol space) operations, suffering from structural defects such as language redundancy, discrete bottlenecks, sequence inefficiency, and semantic loss.

1.1 Latent Space vs. Explicit Space: Core Differences

Visual comparison between latent and explicit space characteristics

Four Representation Attributes:

Readability: Explicit space consists of human-readable discrete symbols; latent space comprises model-native high-dimensional vectors that are not directly interpretable by humans but offer richer representations.

Form of Existence: Explicit space is discrete, fixed, and contains much redundant information; latent space is continuous, flexible, and retains only core semantics.

Computational Efficiency: Explicit space generates word-by-word with repeated recoding, wasting significant compute power; latent space performs direct vector operations without extra conversion overhead.

Semantic Retention: Explicit space recoding easily loses fine-grained semantics; latent space can preserve complete information with high fidelity.

Four Functional Capabilities:

Operability: Explicit space is non-continuous and non-differentiable; latent space is continuous and differentiable, supporting precise semantic manipulation.

Expressive Power: Explicit space is limited to linguistically describable content; latent space breaks vocabulary and grammar constraints, capable of handling high-dimensional non-linguistic information.

Scalability: Explicit space is strictly limited by sequence length; latent space easily adapts to long reasoning and multi-interaction scenarios.

Generalization: Explicit space is bound by linguistic forms; latent space captures abstract laws, significantly improving cross-domain generalization.

2. Evolution: How is the "Latent Space" of Large Models Evolving?

The research and development of large model latent spaces has progressed through four iterative stages alongside technological advancements, moving from theoretical concepts to full-scenario implementation: the Prototype Stage, Formation Stage, Expansion Stage, and Explosion Stage.

Timeline of latent space evolution stages

2.1 Prototype Stage

First verification that reasoning can be detached from natural language and completed using continuous vectors; the first-generation latent reasoning framework was born, remaining at the concept verification stage.

2.2 Formation Stage

Established theoretical foundations, using mathematics to prove the computational advantages of latent spaces; initial forays into multimodal applications, though still primarily focused on text reasoning.

2.3 Expansion Stage

Expanded comprehensively from pure text to vision, multi-agent systems, and robotic embodiment; technology began to mature.

2.4 Explosion Stage

Latent space became an independent computational space and paradigm for large models; exclusive architectures and optimization strategies emerged in batches, with applications exploding across text, vision, embodiment, and multi-agent domains.

3. Mechanism: How Does the "Latent Space" of Large Models Work?

Latent space constructs a full-process operational logic through four synergistic dimensions: architecture, representation, computation, and optimization, addressing four core issues: "embedding in models, information bearing, operation processing, and effect tuning."

Mechanism diagram showing architecture, representation, computation, and optimization

3.1 Architecture: Model Integration Methods for Latent Space

Built-in Backbone: Directly modifies the model backbone to natively support latent computation; Plugin Components: Adds plugins for projection, alignment, and storage without altering the model backbone to extend latent functions; Auxiliary Models: External independent models provide supervisory signals to assist the main model in generating latent spaces.

Architectural approaches to latent space integration

3.2 Representation: Information Carriers of Latent Space

Internal Representation: Reuses internal activations like model hidden states and KV caches with no extra parameters; External Representation: Freezes external pre-trained models to generate latent information injected into the main model; Learnable Representation: Trainable modules generate latent information, optimizing end-to-end with the main model; Hybrid Representation: Combines learnable and external injection, balancing flexibility and stability.

Detailed view of representation mechanisms

3.3 Computation: Information Processing Patterns in Latent Space

Compressed Computation: Compresses reasoning trajectories and caches to reduce compute consumption; Expanded Computation: Expands compute power through recurrence and parallelism to enhance expressive ability; Adaptive Computation: Dynamically allocates compute power based on input difficulty, balancing efficiency and performance; Interleaved Computation: Alternates operations between explicit tokens and latent information or multimodal data, fusing the advantages of both.

Visualizing interleaved and adaptive computation

3.4 Optimization: Full Lifecycle Tuning

Pre-training: Equips the model with latent computation capabilities from the early training stages; Post-training: Fine-tunes the latent space on pre-trained models to adapt to downstream tasks; Inference: Real-time correction of latent states to directly optimize output effects.

Optimization strategies across the model lifecycle

4. Capabilities: What Abilities Does the "Latent Space" of Large Models Enable?

Overview of seven core capabilities enabled by latent space

Latent space thoroughly breaks through the expression and computation bottlenecks of discrete tokens, unlocking seven core intelligent capabilities: reasoning, planning, modeling, perception, memory, collaboration, and embodiment.

4.1 Reasoning Capability

Enables implicit reasoning, compact trajectories, continuous iterative correction, branch path exploration, and stronger cross-modal generalization.

4.2 Planning Capability

Supports controllable path exploration, efficient solution space search, adaptive compute allocation, and optimized decision trajectories.

4.3 Modeling Capability

Can richly express complex computations, self-inspect internal states, achieve robust control of model behavior, and enhance expansion capabilities.

4.4 Perception Capability

Retains fine structural information in vision, achieves heuristic imagination, and ensures faithful localization.

4.5 Memory Capability

Creates working memory storage, persistent memory, and multimodal memory recall.

4.6 Collaboration Capability

Enables lossless semantic transmission between agents, shared cognition among agents, and supports cross-modal interoperability for heterogeneous models.

4.7 Embodiment Capability

Unsupervised action execution, implicit thinking and planning, scene prediction, spatial cognition, and empowering robots with cross-hardware generalization and transfer.

5. Outlook

5.1 Existing Challenges

Difficult to Evaluate: Intermediate computation processes are invisible, making it impossible to verify reasoning rationality; Difficult to Control: Impossible to precisely manipulate internal continuous representations; Difficult to Explain: High-dimensional vectors lack intuitive semantics, making model behavior untraceable.

5.2 Future Development Directions

Build Unified Theories: Clarify latent space computation principles, collaboration rules with explicit spaces, and establish standard evaluation systems; Deepen Multimodal Integration: Create unified native latent computation spaces for text, vision, and action; Implement Downstream Tasks: Use latent spaces to support practical scenarios like reasoning and robot control; Achieve Controllable Governance: Make latent spaces observable and manageable to solve trustworthiness and safety issues.