The $\mathbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $λ$-Calculus

Mar 20, 2026•Amartya Roy, Rasul Tutunov, Xiaotong Ji +2•View PDF

TL;DR Highlight

Instead of having LLMs write recursive code directly, use deterministic lambda-calculus-based combinators (SPLIT/MAP/FILTER/REDUCE) to process long documents — achieving +21.9% accuracy and 4.1x speed.

Who Should Read

Backend/ML engineers building long document processing pipelines beyond 128K tokens or large codebase analysis. Developers struggling with non-deterministic execution flow in LLM agents.

Core Mechanics

Existing RLMs (Recursive Language Models) use open-ended REPL loops where LLMs generate Python code directly, causing infinite loops, code parsing errors, and unpredictable costs.
Lambda-RLM restricts LLMs to 'oracles for small chunks only' while pre-verified combinators like SPLIT/MAP/FILTER/REDUCE handle all control flow — the LLM doesn't need to write code.
Mathematically proven termination guarantees, closed-form cost bounds, and accuracy scaling laws — formal guarantees that existing RLMs lack.
An 8B model + lambda-RLM matches a 70B model + standard RLM in accuracy (35.7% vs 36.1%) while being 3.1x faster — model size can be traded for structure.
On O(n^2) pair comparison tasks (OOL-Pairs): +28.6pp accuracy improvement, 6.2x speedup — because symbolic CROSS operations compute pairs without neural network calls.
For strong code generation models (Llama-405B, Codestral-22B) or CodeQA tasks requiring free code exploration, standard RLMs still win — a combinator library limitation.

Evidence

Across 9 models x 4 tasks = 36 comparisons, lambda-RLM won 29/36 (81%), with a 12/12 (100%) sweep in the weak model (8B/7B) tier.
Average accuracy improvement: Weak models +21.9pp (13.8% → 35.7%), Medium +18.6pp (31.3% → 50.1%), Strong +8.8pp (50.1% → 58.9%).
Latency reduction: Weak 4.1x (229s → 57s), Medium 4.1x (193s → 47s), Strong 3.3x (129s → 39s) — vs. standard RLM.
Ablation: Replacing the combinator library with free-form code generation caused -24.2pp accuracy and 3.9x latency increase (Qwen3-8B x OOLONG).

How to Apply

For long document processing pipelines, instead of letting the LLM decide how to split, create pre-defined execution plans per task type using SPLIT → MAP(M) → REDUCE patterns (search: add FILTER, aggregation: MERGE, pair comparison: use CROSS).
Chunk size k is mathematically optimal at k*=2, so adopt binary split as the default strategy instead of splitting into too many chunks, increasing only when needed to minimize cost.
If your agent system delegates control flow to the LLM, separate roles into 'LLM handles semantic inference only at leaf nodes, deterministic code handles all orchestration' to gain termination guarantees and cost predictability.

Code Example

snippet

# λ-RLM Core Pattern - Direct Implementation Example
# Reference: github.com/lambda-calculus-LLM/lambda-RLM

from typing import List, Callable

def split(text: str, k: int) -> List[str]:
    """SPLIT combinator: Evenly splits text into k chunks"""
    chunk_size = len(text) // k
    return [text[i*chunk_size:(i+1)*chunk_size] for i in range(k)]

def lambda_rlm(
    prompt: str,
    model_fn: Callable[[str], str],  # LLM called only at leaf nodes
    context_window: int = 32000,
    k_star: int = 2,  # Theorem 4: optimal partition size = 2
    task_type: str = "aggregate"
) -> str:
    """λ-RLM recursive executor (Y-combinator pattern)"""
    tau_star = context_window  # leaf threshold
    
    def phi(P: str) -> str:  # recursive executor
        if len(P) <= tau_star:
            # Base case: direct LLM call (the only neural network operation)
            return model_fn(P)
        else:
            # Recursive case: pure symbolic operations
            chunks = split(P, k_star)           # SPLIT - deterministic
            results = [phi(chunk) for chunk in chunks]  # MAP - recursive
            return reduce_by_task(results, task_type)   # REDUCE - deterministic
    
    return phi(prompt)

def reduce_by_task(results: List[str], task_type: str) -> str:
    """Composition operator per task type"""
    if task_type == "aggregate":
        return "\n".join(results)  # MERGE
    elif task_type == "search":
        return max(results, key=lambda x: len(x))  # BEST
    elif task_type == "summarize":
        return "\n".join(results)  # CONCAT (subsequently calls upper-level M)
    return "\n".join(results)

# Usage example
import openai
def call_llm(text: str) -> str:
    resp = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": text}]
    )
    return resp.choices[0].message.content

# Process a 128K token document with a 2K context model
long_doc = "..." * 50000  # very long document
result = lambda_rlm(
    prompt=long_doc,
    model_fn=call_llm,
    context_window=2000,
    k_star=2,  # optimal value
    task_type="aggregate"
)

Terminology

lambda-calculusA mathematical language expressing all computation through functions and function application. The theoretical foundation of programming languages, used here to formalize LLM recursive execution structures.

Y-combinatorA mathematical trick enabling unnamed functions to recursively call themselves. When a language can't call a function by its own name, applying Y enables recursion.

RLM (Recursive Language Model)A method where LLMs write Python code directly in a REPL environment to recursively decompose long prompts. Powerful but risks code errors and infinite loops.

REPL (Read-Eval-Print Loop)An interactive environment that executes code as you type. Python console and Jupyter notebooks are classic examples.

combinatorA pure function that takes input and produces output. In lambda-RLM, SPLIT, MAP, FILTER, REDUCE are combinators — all deterministic and pre-verified.

context rotThe phenomenon where LLM accuracy drops exponentially as input grows longer. Like forgetting earlier chapters of a book as you read further.

fixed-point combinatorA higher-order function where fix(f) = f(fix(f)). Enables even anonymous functions to recurse. The Y-combinator is the canonical implementation.

context windowThe maximum token count an LLM can process at once. GPT-4 handles 128K; smaller models are limited to 4K-32K.

Related Resources

Original Abstract (Expand)

LLMs are increasingly used as general-purpose reasoners, but long inputs remain bottlenecked by a fixed context window. Recursive Language Models (RLMs) address this by externalising the prompt and recursively solving subproblems. Yet existing RLMs depend on an open-ended read-eval-print loop (REPL) in which the model generates arbitrary control code, making execution difficult to verify, predict, and analyse. We introduce $λ$-RLM, a framework for long-context reasoning that replaces free-form recursive code generation with a typed functional runtime grounded in $λ$-calculus. It executes a compact library of pre-verified combinators and uses neural inference only on bounded leaf subproblems, turning recursive reasoning into a structured functional program with explicit control flow. We show that $λ$-RLM admits formal guarantees absent from standard RLMs, including termination, closed-form cost bounds, controlled accuracy scaling with recursion depth, and an optimal partition rule under a simple cost model. Empirically, across four long-context reasoning tasks and nine base models, $λ$-RLM outperforms standard RLM in 29 of 36 model-task comparisons, improves average accuracy by up to +21.9 points across model tiers, and reduces latency by up to 4.1x. These results show that typed symbolic control yields a more reliable and efficient foundation for long-context reasoning than open-ended recursive code generation. The complete implementation of $λ$-RLM, is open-sourced for the community at: https://github.com/lambda-calculus-LLM/lambda-RLM.