A practitioner's guide to the core algorithmic pattern behind RLMs -- and why it mirrors how expert humans solve complex problems.
Every significant advance in computing has a core pattern at its heart. MapReduce had map and reduce. Neural networks have forward and backward passes. Recursive Language Models have decompose-recurse-aggregate -- a three-phase pattern that is deceptively simple to describe and remarkably powerful in practice.
Understanding this pattern is not just academic. It is the key to understanding why RLMs can solve problems that standard LLMs cannot, and it provides a framework for reasoning about which tasks will benefit most from the recursive approach.
Decompose. The model examines the task and the input, then breaks both into smaller pieces. This is not a fixed chunking strategy -- the model decides how to decompose based on the structure of the specific problem. A legal document review might decompose by section. A codebase analysis might decompose by module. A multi-document comparison might decompose by document pairs. The decomposition is semantic, not mechanical.
This is where the REPL environment becomes critical. The input is stored as a variable that the model can inspect programmatically. It can read the first thousand characters to understand the structure, then write code to split the input intelligently. It might parse headers, detect natural boundaries, or use domain-specific heuristics. The decomposition strategy itself is generated by the model, tailored to the task at hand.
Recurse. The model invokes itself on each sub-problem. This is a genuine recursive call -- the sub-invocation gets its own fresh context window, its own REPL environment, and the relevant slice of input. It does not share state with the parent call except through the explicit return value.
The critical property here is depth. If a sub-problem is still too large or too complex for a single invocation, the recursive call can itself decompose and recurse further. This creates a tree of processing where the depth adapts to the actual difficulty of the problem. A simple extraction task might need only one level of recursion. A task requiring pairwise comparison of every section in a long document might need three or four levels.
Aggregate. The model collects the results from its recursive sub-calls and combines them into a final answer. Again, this is not a fixed operation. The aggregation strategy depends on the task: it might be concatenation, majority voting, merging partial results with conflict resolution, or a more complex synthesis that requires the model to reason about the relationships between sub-results.
The decompose-recurse-aggregate pattern is a generalization of divide-and-conquer, one of the most fundamental algorithmic strategies in computer science. Merge sort, quicksort, the Fast Fourier Transform, Strassen's matrix multiplication -- all follow this pattern. The power comes from the ability to reduce a problem of size n to multiple problems of size n/k, solve those independently, and combine the results in less work than solving the original problem directly.
For LLMs, the "work" at each step is bounded by the context window and the model's computational depth (number of layers). A standard Transformer can do O(L) sequential reasoning steps where L is the number of layers. If the task requires O(n) steps where n exceeds L, the model simply cannot solve it in a single pass. But with recursive decomposition, the model can solve tasks requiring O(n log n) or even O(n^2) total computation by distributing the work across multiple invocations, each operating within its computational budget.
This is why the OOLONG-Pairs benchmark is so illuminating. The task requires O(n^2) pairwise comparisons. No amount of context window expansion will help a fixed-depth Transformer solve this, because the bottleneck is not how much text the model can see but how many sequential comparisons it can make. An RLM writes a nested loop: the outer loop iterates over chunks, and for each chunk, an inner loop compares it against all other chunks via recursive sub-calls. The total compute scales quadratically, but each individual call stays within bounds.
There is a reason this pattern feels intuitive: it mirrors how skilled humans handle complex tasks. Consider a senior attorney conducting due diligence on a large contract. They do not read the entire 500-page document from start to finish and then produce their analysis. Instead, they:
Decompose: Scan the table of contents, identify the major sections (representations, covenants, indemnification, schedules), and prioritize by risk.
Recurse: Work through each section individually, and within each section, drill down into sub-clauses that raise flags. Some sub-clauses reference other sections, triggering further drill-downs.
Aggregate: Compile findings from each section into a memo, resolving cross-references, identifying conflicts between sections, and producing a risk assessment that synthesizes everything.
A junior attorney might try to hold the entire document in their head and produce the analysis in one pass. The senior attorney knows that the task exceeds working memory capacity and uses an external structure (notes, outlines, cross-reference tables) to extend their cognitive reach. RLMs formalize this same insight: use external state and recursive processing to extend beyond the limits of what any single pass can accomplish.
The DRA pattern sounds straightforward in theory. The engineering challenge lies in several details that separate a naive implementation from an effective one.
Decomposition granularity. Splitting the input into too many small chunks creates overhead and loses cross-chunk context. Splitting into too few large chunks risks exceeding the effective capacity of each sub-call. The MIT paper finds that the model learns reasonable decomposition strategies through post-training on as few as 1,000 examples. The model adapts its chunking to the task: it uses larger chunks for extraction tasks (where context preservation matters) and smaller chunks for comparison tasks (where each chunk needs to be processed against many others).
State passing. Each recursive call needs enough context to perform its sub-task, but not so much context that the benefits of decomposition are lost. The REPL environment handles this cleanly: the parent call can set up variables that child calls inherit, and child calls return structured results that the parent can parse. This is a well-defined interface -- not a vague "shared context" but explicit variable passing, like function arguments and return values in a programming language.
Depth control. Unbounded recursion is a risk. In practice, RLM implementations set a maximum recursion depth (typically 3-5 levels) and a maximum number of total sub-calls. The model can also terminate recursion early when it determines that the sub-problem is small enough to solve directly. This mirrors the base case in recursive algorithms: at some point, the problem is small enough to solve without further decomposition.
Parallel execution. Independent sub-calls can be executed in parallel, which is important for both latency and throughput. The Google Agent Development Kit implementation, for example, supports parallel sub-calls by default. This means that a task decomposed into ten independent chunks can be processed roughly ten times faster than sequential execution, limited only by available compute.
Not every task benefits from decompose-recurse-aggregate. The pattern is most valuable when the input is large relative to the model's effective context capacity and the task requires reasoning over distributed information. It is less useful -- and can even be counterproductive -- in scenarios where:
The task requires holistic judgment that cannot be decomposed. Some creative writing tasks, for instance, depend on a unified stylistic vision that emerges from considering the entire context simultaneously. Decomposing the input might destroy the very property the task requires.
The input is short enough that the model can handle it in a single pass. Adding recursion to a task that fits comfortably in the context window introduces unnecessary overhead and potential information loss at chunk boundaries.
The task is purely retrieval-based. If you just need to find one piece of information in a large document, a simple search or RAG pipeline is faster and more reliable than recursive decomposition.
Understanding when to use DRA and when to skip it is part of what makes RLMs effective. The post-trained models in the MIT work learn this distinction: they apply recursion when the task requires it and process directly when it does not. The pattern is a tool, not a mandate.
The decompose-recurse-aggregate pattern is likely to become one of the standard algorithmic patterns in the AI practitioner's toolkit, alongside attention, retrieval, and chain-of-thought. Its power lies in its generality: any task that can be expressed as a divide-and-conquer problem (and most complex tasks can) is a candidate for this approach.
As tooling matures -- DSPy already supports it natively, and several agent frameworks are adding built-in DRA primitives -- the barrier to using this pattern will drop. The question will shift from "how do I implement recursive processing" to "what decomposition strategy works best for this specific task type." That is the kind of question that drives genuine engineering progress.