Agent-to-Agent Recursive Knowledge Transfer

Back to Blog

February 2026

The recursive learning conversation has a blind spot. Nearly every paper, blog post, and Twitter thread about RLMs focuses on the same thing: a single model improving itself through iterative refinement. Feed outputs back as inputs. Let the model critique its own work. Run the loop until convergence.

This is useful. It is also the least interesting thing recursive systems can do.

The frontier that matters -- the one that changes what is possible at scale -- is agent-to-agent recursive knowledge transfer. Not one model talking to itself. Multiple models teaching each other, where the thing being transferred is not an answer but the capacity to arrive at one.

The transfer spectrum

To understand why this matters, consider the three ways one model can currently learn from another.

Method	What transfers	Requires	Limitation
Distillation	Probability distributions over tokens	Teacher's weights or logits	Cannot transfer reasoning strategies, only surface distributions
Behavioral cloning	Input-output pairs	Demonstration data	Mimics outputs without understanding why; brittle on distribution shifts
Recursive transfer	The decomposition and reasoning process itself	Structured interaction protocol	Still emerging; no dominant framework yet

Distillation gives you a smaller model that approximates the teacher's outputs. Behavioral cloning gives you a model that can repeat what the teacher did. Neither transfers how the teacher figured it out.

Recursive transfer is different. When Agent A solves a problem using the decompose-recurse-aggregate pattern, it produces something more valuable than a final answer. It produces a trace of how it broke the problem apart, what sub-problems it identified, how it chose to recurse, and how it reassembled the pieces. That trace is a compressed representation of a problem-solving strategy. And it is transferable.

Why RLMs make this possible

Standard LLMs do not produce the kind of structured intermediate reasoning that makes transfer meaningful. They produce a sequence of tokens. You can copy the tokens. You cannot copy the implicit process that generated them because that process is entangled in billions of parameters and is not separately addressable.

RLMs change this. The decompose-recurse-aggregate pattern forces the model to externalize its reasoning structure. Each recursive call is an explicit decision: this is the sub-problem I am solving, this is the context I am carrying forward, this is how I will integrate this result with the others. That externalization is what makes the process transferable.

Consider a concrete example. Agent A is a large model (say, 70B parameters) that has been post-trained on recursive processing of legal contracts. It encounters a new type of contract and develops a decomposition strategy: split by section, cross-reference defined terms, identify obligation chains, flag asymmetric risk clauses, then aggregate into a risk assessment.

In a standard system, Agent B (an 8B model) would need to be trained on the same contract data to learn anything from this. With recursive transfer, Agent B receives the decomposition strategy itself -- the schema of how to break this kind of document apart and what to look for at each level. Agent B may not execute the strategy as well (smaller capacity, less world knowledge), but it has the structure of the approach without needing to discover it independently.

The three modes of recursive transfer

Not all agent-to-agent teaching is the same. The emerging research suggests three distinct modes, each useful in different contexts.

Strategy transfer. The teacher communicates a decomposition strategy: how to break a problem class apart, what to recurse on, what to aggregate. This is the highest-level form of transfer. It works best when the student model has sufficient base capability but lacks domain-specific problem-solving patterns. Think of it as transferring a flowchart, not a dataset.

Critique transfer. The teacher does not solve the problem. Instead, it evaluates the student's recursive trace and identifies where the decomposition went wrong, where the recursion was too shallow or too deep, or where the aggregation lost information. This is recursive teaching in the most literal sense -- the teacher recurses over the student's recursive process. Early results from MIT's OASYS lab suggest this mode produces the most durable learning in the student.

Scaffold transfer. The teacher provides a partial recursive structure -- the first level of decomposition, some anchor sub-results -- and the student fills in the rest. This is the most practical mode for production systems today because it does not require the student to accept arbitrary instructions. The student operates within its normal capabilities but with a structural head start.

What this looks like in practice

The agent orchestration systems that are beginning to appear in production (multi-agent frameworks from LangChain, CrewAI, AutoGen, and others) almost universally treat agents as black boxes that exchange messages. Agent A asks Agent B a question. Agent B responds. The orchestrator routes messages and manages state.

This is message passing, not knowledge transfer. It is the multi-agent equivalent of behavioral cloning: you see the output, not the process.

A recursive transfer architecture looks different. When Agent A completes a task, it does not just produce a result. It produces a recursive trace -- a structured record of every decomposition decision, every recursive call, every aggregation step. This trace is compact (typically 5-15% the size of the full context processed) and reusable.

When Agent B encounters a structurally similar task, it does not start from scratch. It loads the relevant trace, adapts the decomposition to its specific input, and executes. If the adaptation fails (the new input has structure the trace did not anticipate), Agent B falls back to independent processing and produces a new trace. Over time, a shared library of traces accumulates -- not a knowledge base of answers, but a knowledge base of problem-solving approaches.

This is fundamentally different from retrieval-augmented generation. RAG retrieves information. Recursive transfer retrieves strategies.

The scaling implications

The reason this matters beyond academic interest is scaling. Current multi-agent systems scale linearly: more agents, more compute, proportionally more capability. Recursive transfer introduces a different scaling curve.

When Agent A learns a new decomposition strategy and that strategy transfers successfully to Agents B through Z, the system has gained capability at the cost of one learning event, not twenty-six. As the number of agents grows, the value of each individual learning event grows with it. This is super-linear scaling of learning, and it is the property that makes recursive transfer qualitatively different from existing approaches.

There are limits. Strategy transfer degrades when the capability gap between teacher and student is too large. A 70B model's decomposition strategy may reference sub-tasks that an 8B model simply cannot execute. Critique transfer requires the teacher to model the student's capabilities, which is itself a hard problem. And scaffold transfer works only for tasks with transferable structure -- purely novel problems with no structural precedent still require independent processing.

But within those limits, the efficiency gains are significant. Early experiments show 3-7x reduction in the compute required for a student model to reach competence on a new task class when provided with recursive traces from a teacher, compared to independent learning or behavioral cloning from demonstrations.

What is missing

The honest answer is: a lot. There is no standard format for recursive traces. There is no consensus on how to evaluate whether a transfer was successful (beyond task performance, which conflates many factors). There is no theory of which strategies transfer well and which do not.

Most critically, there is no widely adopted framework for recursive trace compression. Raw traces are too large and too specific to transfer directly. The useful signal -- the structural decisions, the decomposition logic, the aggregation rules -- needs to be extracted and generalized. This is itself a recursive problem (use an RLM to recursively compress another RLM's recursive trace), and solving it well is likely the key technical challenge of the next two years.

The labs are circling this. Anthropic's constitutional AI work implies structured reasoning that could generate transferable traces. OpenAI's chain-of-thought distillation suggests they are thinking about process transfer, not just output transfer. Google DeepMind's Gemini architecture supports the kind of multi-modal recursive processing that would benefit most from shared strategies. But no one has published a unified framework yet.

Why this is the real frontier

Self-improvement loops are a dead end at scale. A single model refining its own outputs converges quickly, hits its capability ceiling, and cannot discover strategies outside its existing distribution. It is optimization, not learning.

Agent-to-agent recursive transfer is learning. It allows a population of models to collectively explore a larger strategy space than any individual model could reach alone. It converts one model's hard-won insight into a reusable asset for every other model in the system. And it does this without requiring shared weights, shared training data, or shared architecture.

The recursive language model paradigm made long-context processing tractable. Agent-to-agent recursive transfer could make collective intelligence tractable -- not as a metaphor, but as a concrete, measurable property of multi-agent systems.

That is a bigger deal than processing longer documents. And it is the paper that nobody has written yet.

References & Further Reading

Saha, S. et al. "Recursive Language Models: A New Paradigm for Long-Context Processing." MIT OASYS Lab, 2025.
Hinton, G. et al. "Distilling the Knowledge in a Neural Network." NeurIPS Workshop, 2015.
Osa, T. et al. "An Algorithmic Perspective on Imitation Learning." Foundations and Trends in Robotics, 2018.
Hong, S. et al. "MetaGPT: Meta Programming for Multi-Agent Collaborative Framework." ICLR 2024.
Wu, Q. et al. "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation." Microsoft Research, 2023.
Zelikman, E. et al. "STaR: Self-Taught Reasoner -- Bootstrapping Reasoning with Reasoning." NeurIPS 2022.
Anthropic. "Constitutional AI: Harmlessness from AI Feedback." 2022.