In the wave of generative AI, we have witnessed a leap in image quality from Stable Diffusion to large-scale diffusion models like FLUX and Qwen-Image. However, this leap comes at a cost. To 'sculpt' clear images from pure noise, these models typically require 40 to 100 steps (NFE) of iterative denoising. This delay makes it difficult for the models to be truly applied in real-time generation or large-scale services.
Thus, 'few-step generation' has become a critical battleground. For the winding generation trajectory of the original teacher model, current few-step acceleration schemes (such as Progressive Distillation, Distribution Matching, etc.) are all trying to do the same thing: straighten the curves and reach the destination in one step.
However, the original high-dimensional generation trajectory is extremely complex. Forcibly 'straightening' it leads to geometric mismatch on the trajectory. This directly causes structural collapse and detail loss during few-step generation.
Is there a method that can be fast yet follow the original winding trajectory?
Fudan University and Microsoft Research Asia have brought ArcFlow to answer: If the road is winding, learn to 'drift' instead of straightening it.
Paper URL: https://arxiv.org/abs/2602.09014
Project Code URL: https://github.com/pnotp/ArcFlow
1. The Dilemma: Why is 'Walking Straight' Hard to Learn?
In diffusion models, the generation process of the teacher model (pre-trained teacher) essentially involves solving differential equations and performing multi-step integration in high-dimensional space. Due to the complexity of the image manifold, the teacher model's original sampling trajectory is usually a winding curve, with its tangent direction (i.e., velocity field) constantly changing over time steps.
To accelerate, existing distillation methods (such as Progressive Distillation, Instaflow, etc.) attempt to compress this trajectory into a one-step straight line. Their logic is: since walking curves is slow, train the student model to connect the start (noise) and end (image) with a straight line. If the student can learn to walk this straight line, won't inference only require one step?
This strategy leads to two fatal problems:
1. Geometric Mismatch: The teacher model's original weights are trained based on a curved trajectory. Forcing the student model to fit a straight line is like making it 'betray' the teacher's original generative prior. This geometric mismatch makes it hard for the student to learn, or leads to structural collapse.
2. High Learning Cost: To forcibly twist the trajectory, the student model often needs full fine-tuning (full fine-tuning). This not only trains slowly and has high memory overhead, but also easily causes 'catastrophic forgetting', damaging the large model's original excellent generalization ability.
Hence, we often see: many distilled models, although faster, have unstable generation quality and even reduced understanding of complex prompts.
If we don't force straightening, how else can we speed up?
2. Insight: The Velocity Field is Not Random, It's Continuous
The ArcFlow team re-examined the teacher's trajectory. According to the theoretical laws of ODE, the denoising velocity direction between adjacent time steps is not jumping, but has strong correlation. This is like a race car cornering; the direction and speed in the next second largely depend on the current state and inertia. Since the teacher model's trajectory itself changes continuously, why don't we directly model this 'change law' instead of forcing it into a straight line?
If we can find a parameterization method that describes this 'bending' trend, then the student model doesn't need to struggle to straighten the road, but can 'drift along' the teacher's potential, reaching the endpoint in very few steps.
Based on this core insight, ArcFlow was born.
3. ArcFlow's Three Killer Features
1. Momentum Parameterization: Adding 'Inertia' to the Generation Process
To capture the 'velocity continuity' mentioned above, ArcFlow introduces the classic 'Momentum' concept from physics.
In traditional methods, the model predicts velocity independently at each time step. In ArcFlow, we model the velocity field as a mixture of multiple continuous momentum processes. Simply put, the model not only predicts the current 'velocity', but also a 'momentum factor'. This factor describes the trend of velocity decaying or enhancing over time. It's like knowing an object's initial velocity and force situation (momentum); even without looking at the intermediate process, we can directly predict through physics formulas whether its future trajectory is curved or straight.
This design allows ArcFlow to explicitly construct nonlinear trajectories. With as few as 2-4 steps, this nonlinear trajectory can more accurately fit the teacher's original path than a rigid straight line.
2. Analytic Solver: 'Zero Error' at the Mathematical Level
Since we have perfectly defined the evolution law of velocity over time with 'momentum formulas', the integration of this trajectory is analytically solvable.
That is, we can derive a closed-form solution.
This means ArcFlow does not need to fit the trajectory through discrete steps like traditional solvers. It only needs one forward pass to precisely calculate the terminal state after any time interval through mathematical formulas.
This 'zero error' integration at the mathematical level is key to ArcFlow's high-precision flow matching. It eliminates the discretization noise in traditional distillation methods, making generated image details clear.
3. Minimal Training Strategy: LoRA Fine-tuning with Less Than 5% Parameters
This is the most exciting part for developers.
As mentioned earlier, traditional methods, because they 'force straighten' the trajectory, have to rewrite the entire model's parameters. ArcFlow chooses to 'go with the flow'; its nonlinear trajectory naturally fits the teacher's pre-trained distribution.
Therefore, ArcFlow does not need to destroy the teacher model's original parameters. Experiments show that only by fine-tuning less than 5% of parameters via LoRA (mainly to adapt the new momentum prediction head), perfect trajectory alignment can be achieved.
This strategy brings two major benefits:
- Fast Training Convergence: Compared to full fine-tuning methods like TwinFlow, ArcFlow's convergence speed is over 4 times faster.
- Retains Teacher Prior: Maximally inherits the vast knowledge base of FLUX/Qwen, avoiding collapse or quality degradation seen in other distilled models.
4. Experimental Data
The team validated on Qwen-Image-20B and FLUX.1-dev, currently the strongest open-source models. Results show that ArcFlow achieves a balance of speed, quality, and efficiency.
1. Inference Speed
Directly compressed from the original 50-100 step iterations to 2 steps (2 NFE). On the same hardware, this achieves over 40x acceleration.
2. Quality Performance
On benchmarks like Geneval and DPG-Bench, ArcFlow's FID and semantic consistency scores at the 2-step setting mostly outperform or match current SOTA methods.
Visual Comparison:
From the effect images in the paper, under the same 2-step inference, images generated by other linear distillation methods often suffer from background blurring and object structure distortion (such as broken/ghosting swords, blurry backgrounds), especially with different initial noises, other methods may show similar generation patterns and diversity collapse. In contrast, ArcFlow-generated images not only have high clarity but also retain the rich details and visual diversity of the teacher model.
3. Training Efficiency
Thanks to more accurate trajectory fitting and LoRA strategy, ArcFlow's training curve is pleasing. Under the same number of iteration steps, ArcFlow's FID scores and image quality significantly lead. For labs or individual developers without massive computing power, this greatly lowers the barrier to reproduction and customization.
4. More Effect Showcases
5. Summary
ArcFlow proposes a new solution for few-step distillation: compared to the 'brute force' of 'straightening curves', it's better to conform to the model's original feature space and use parameters to describe its complexity. Through momentum parameterization and analytic solvers, ArcFlow avoids unstable adversarial objective functions and full-parameter training, achieving faster convergence and more efficient distillation. This provides a highly promising direction for future efficient generation model research.