WWW'26 | A New Paradigm for Cross-Task Adaptive Multi-Agent Collaboration

Hello everyone, I'm PaperAgent, not an Agent!

Large Language Model-driven Multi-Agent Systems (MAS) are becoming a crucial paradigm for solving complex tasks: different agents assume different roles and collaborate to complete tasks such as mathematical reasoning, code generation, knowledge-based Q&A, and even complex user requests in web services.

Different agent architectures for code generation in Claude Code

But a critical question persists: How should agents collaborate? Who speaks first? Who communicates information to whom? Which experts need to join? These questions collectively define the collaboration topology of the MAS, directly impacting system performance, efficiency, and robustness.

Although existing automatic topology design methods can learn collaboration structures for specific tasks, most still follow the one model for one dataset (one-for-one) paradigm: training a separate topology design model for each task domain. In real-world scenarios with cross-domain and unpredictable user requests, this approach not only incurs high maintenance costs but also struggles to reuse shared collaborative knowledge across different tasks.

Diagram comparing one-for-one vs one-for-all topology design paradigms

Recently, a team from Griffith University and Northwest A&F University proposed OFA-MAS, pushing multi-agent topology design from one-for-one to one-for-all: training a single, universal topology design model that can automatically generate suitable multi-agent collaboration graphs for natural language tasks across different domains.

OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models
Authors: Shiyuan Li, Yixin Liu, Yu Zheng, Mei Li, Quoc Viet Hung Nguyen, Shirui Pan
https://dl.acm.org/doi/abs/10.1145/3774904.3792537
https://github.com/Shiy-Li/OFA-MAS

A New Paradigm for Multi-Agent Topology Design: From One-for-One to One-for-All

Flowchart of the OFA-MAS model generating a topology from a user query

The capability of a multi-agent system depends not only on the abilities of individual agents but also on the communication structure between them. A good topology allows the right experts to participate in reasoning at the right moments, whereas a poor topology can lead to information redundancy, error propagation, or inefficient collaboration.

Early methods typically relied on manually designed structures, such as Chain, Tree, and Debate fixed topologies. These structures are simple and intuitive but struggle to adapt to diverse tasks. Recent graph learning methods have further attempted automatic topology design, for example, AgentDropout optimizes a predefined graph through dynamic pruning, G-Designer learns task-relevant interaction structures, and ARG-Designer generates multi-agent collaboration topologies in an autoregressive manner.

However, these methods are still one-for-one: training a dedicated model for each single task domain like MMLU, GSM8K, or HumanEval. This paradigm faces three types of problems in real-world deployment:

Domain assumptions are detached from reality: The one-for-one paradigm assumes task domains are singular and known, but real-world requests are often cross-domain and unpredictable. A system cannot require users to pre-classify domains, greatly limiting practical application.
High expansion and maintenance costs: Adding each new domain may require re-collecting data, retraining models, and tuning parameters.
Ignoring cross-domain shared knowledge: Similar collaboration patterns, such as "Analyst → Checker → Solver," might be shared across mathematical reasoning, code debugging, and knowledge Q&A.

The goal of OFA-MAS is precisely to train a universal topology designer that can learn reusable collaboration rules from multi-domain tasks and generate a suitable MAS topology for any input query during inference.

How Does OFA-MAS Generate Cross-Domain Collaboration Topologies?

The autoregressive node and edge generation process of OFA-MAS

OFA-MAS models MAS topology design as a conditional graph generation problem: given a user query and a universal role pool, the model directly generates a collaboration graph where nodes are agent roles and edges represent information flow.

OFA-MAS uses autoregressive graph generation as its basic framework:

Selecting the next agent role: Based on the current task and the partially generated graph, decide which type of expert needs to be added next.
Predicting communication connections: Determine which existing agents' information the newly added agent should receive.
Progressively expanding the topology: Continuously repeat the "select role—connect edges" process until a complete collaboration graph is generated.

This autoregressive approach is naturally suited for a one-for-all scenario: different tasks can generate topologies of varying scales, role combinations, and communication methods, without relying on fixed templates.

Task-Aware Graph State Encoding: Enabling Topology Generation to Truly "Understand the Task"

Autoregressive generation alone is insufficient. For a one-for-all model, the same partial graph can imply completely different next-step decisions under different tasks. For example, a code generation task might require a Reviewer and a Debugger, whereas a math problem is more likely to need a Solver and a Verifier.

To address this, OFA-MAS designed the Task-Aware Graph State Encoder (TAGSE). Its core idea is to continuously inject task semantics while encoding the current partial graph, allowing the representation of each node to be regulated by the query.

Specifically, TAGSE uses a pre-trained sentence embedding model to encode the task query and role descriptions, and a contextual gating mechanism filters out information flows irrelevant to the current task. This way, during message passing, the model does not mechanically aggregate all neighbor information but selects genuinely useful structural information based on task requirements.

This design allows OFA-MAS to handle multiple task types within a unified model while retaining task specificity.

MoE Graph Generation Module: Activating Different "Design Experts" for Different Tasks

Architecture diagram showing the MoE routing for node and edge generation

There is no single optimal design strategy for cross-domain MAS topologies. Mathematical reasoning might favor step-by-step solving and verification, code generation might require implementation, review, and debugging, while knowledge Q&A might rely more on information retrieval and synthesis.

Therefore, OFA-MAS introduces a Mixture-of-Experts (MoE) generation module. The model contains multiple expert networks internally, and a gating network dynamically decides which experts participate in the current topology generation based on task semantics.

At each generation step, the MoE module is used for:

Node generation: Predicting the next agent role to be added.
Edge generation: Predicting the information connections between the new agent and existing ones.

This mechanism enables OFA-MAS to learn multiple collaboration strategies within a single, general model: different tasks can activate different combinations of experts, thereby balancing cross-domain sharing and domain specialization.

Three-Stage Training: From Structural Grammar to Task Alignment, to Real-World Validation

Overview of the three-stage training course for OFA-MAS

Training a one-for-all topology design model is not easy because high-quality "task—optimal topology" supervision data is very expensive. OFA-MAS solves this problem through a progressively difficult three-stage training strategy.

Unconditional Graph Pre-training: First, using classic topologies like Chain, Star, and FullConnected to let the model learn the basic "grammar" of collaboration graphs.
LLM-Driven Conditional Pre-training: Using a large language model as a "proxy system designer" to cost-effectively generate large-scale "task query—MAS configuration" data, allowing the model to learn the correspondence between task semantics and topological structures.
Supervised Generative Fine-tuning: Finally, fine-tuning with a small amount of high-quality topology data from real benchmarks, verified via MAS execution, to make the model more sensitive to actual task performance.

Through this curriculum learning approach, OFA-MAS first masters general graph structures, then learns cross-domain task alignment, and finally calibrates generation quality using real execution results.

Experimental Results: One Model, Consistently Leading Across Six Benchmarks

Experiments covered six representative benchmarks, including MMLU, GSM8K, AQuA, MultiArith, SVAMP, and HumanEval, and further tested Out-Of-Distribution (OOD) generalization ability on the unseen GAIA benchmark.

Comparison methods include single-agent CoT and Self-Consistency, fixed MAS topologies, Debate systems, as well as one-for-one graph learning topology design methods like AgentPrune, AgentDropout, G-Designer, and EIB-LEARNER.

1) Overall Performance: The One-for-All Model Surpasses Dedicated One-for-One Methods

As shown in the figure below, OFA-MAS achieved the best average performance across six benchmarks, reaching an average success rate of 93.02%, surpassing all comparison methods.

Even more notably, even using only the first two stages of pre-training without any real benchmark fine-tuning, OFA-MAS achieved an average performance of 92.15%, exceeding the strongest baseline, EIB-LEARNER. This demonstrates that LLM-driven synthetic data and universal structural pre-training alone can provide powerful cross-domain topology design capabilities.

Table comparing accuracy of OFA-MAS against baselines on six benchmarks

2) OOD Generalization: Maintaining Advantage on Unseen GAIA Tasks

To verify whether the model truly possesses cross-domain generalization capability, the paper tested on the GAIA benchmark, which was unseen during training.

It should be noted that under the current evaluation setting, none of the methods used external tools like tool calling. Instead, the comparison focused solely on the performance of different MAS topologies under the same base model and tool-free conditions. This aligns with the standard setup in current MAS topology generation research, allowing a more direct measurement of the benefits brought by the topology design itself.

Results show that OFA-MAS achieved the highest average accuracy on GAIA, with particularly strong performance on Level-1 tasks. In contrast, one-for-one learned methods significantly degraded in the OOD scenario, even performing worse than a simple Chain topology. This suggests that OFA-MAS learns not just local patterns of a specific task domain but more general knowledge about collaborative structures.

Bar chart showing accuracy of OFA-MAS and baselines on GAIA benchmark levels

3) Ablation Study: TAGSE, MoE, and Training Curriculum Are All Indispensable

Ablation experiments showed that replacing TAGSE, removing MoE, or removing any stage of the three-stage training all led to performance degradation. This validates that the key designs of OFA-MAS work in concert to support one-for-all topology generation, rather than being a simple stack of components.

Among these, task-aware encoding allows the model to adjust graph state representations based on the query, the MoE module provides cross-task specialized generation capabilities, and unconditional pre-training, LLM synthetic data pre-training, and real data fine-tuning are respectively responsible for structural priors, task-topology alignment, and empirical performance calibration.

Bar chart of ablation study results showing the impact of removing key components

4) Robustness, Case Studies, and MoE Visualization: Validating OFA-MAS from Results to Mechanisms

(a) In robustness tests simulating malicious agents, OFA-MAS's performance dropped by only about 2.2%, significantly better than other methods. This indicates that the collaboration structures it generates do not overly rely on a single critical node, making it more suitable for real-world deployment environments where agents might be unreliable.

(b)-(c) Case studies also show that OFA-MAS can dynamically select appropriate roles based on the task: generating a sequential code review structure for HumanEval, and combining math solving, coding assistance, and checking roles for GSM8K, demonstrating the ability to flexibly assemble teams from a universal role pool.

(d) MoE visualization further reveals internal expert differentiation: tasks from the same domain stably activate similar expert combinations, while different domains show distinctly different expert preferences. For instance, HumanEval favors experts related to code generation, while GSM8K and MultiArith activate experts more associated with mathematical reasoning.

Heatmaps and case study diagrams showing MoE expert activation patterns and generated topologies

5) Efficiency Analysis: Higher Accuracy with Controllable Token Cost

Beyond performance advantages, OFA-MAS also achieves an excellent balance between accuracy and computational overhead. Token consumption comparisons show that on MMLU and GSM8K, OFA-MAS can achieve higher accuracy at a highly competitive inference cost.

This indicates that OFA-MAS does not simply trade off increased collaboration scale for performance; rather, it can generate more suitable information flow structures, enhancing multi-agent collaboration effectiveness while controlling inference costs.

Scatter plot showing the trade-off between token cost and accuracy for various methods