SLOT: Structuring the Output of Large Language Models

May 6, 2025•D. Wang, Zhengyuan Shen, Soumya Mishra +3•View PDF

TL;DR Highlight

A lightweight post-processing model that converts any LLM's output to JSON — a fine-tuned Mistral-7B beats Claude-3.5-Sonnet's schema accuracy by 25 percentage points.

Who Should Read

Backend/ML engineers dealing with JSON parsing failures from LLM outputs in function calling, agents, or information extraction pipelines. Especially platform teams serving multiple LLMs simultaneously.

Core Mechanics

SLOT: a separate lightweight model that post-processes LLM output into valid JSON — doesn't touch the original LLM weights, so it works with any model
Mistral-7B with LoRA fine-tuning achieves 98.2% schema accuracy and 92.9% content similarity — +23pp and +19pp vs. Claude-3.5-Sonnet respectively
Even Llama-3.2-1B with SFT alone hits 88.9% schema accuracy (on par with Claude-3.5-Haiku at 89.0%) and 81.7% content similarity (beating Sonnet's 73.9%)
Adding constrained decoding (XGrammar) on top pushes Mistral-7B to 99.5% schema accuracy

Evidence

Mistral-7B + SFT + XGrammar: schema accuracy 99.5%, content similarity 94.0% vs. Claude-3.5-Sonnet 74.7% / 73.9%
Llama-3.2-1B + SFT: schema accuracy 88.9% — matches Claude-3.5-Haiku (89.0%), content similarity 81.7% surpasses Sonnet (73.9%)
Consistent improvements across diverse JSON schema complexity levels

How to Apply

Add SLOT as a post-processing layer to your existing LLM API pipeline: feed LLM response text + JSON schema to SLOT and get structured JSON output. No LLM replacement or retraining needed.
For on-prem or edge environments needing reliable structured output without GPT-4o, Llama-3.2-1B or Mistral-7B with LoRA provides a cost-effective alternative.
Combine with constrained decoding (XGrammar) for near-perfect schema compliance at 99.5%.

Code Example

snippet

# SLOT-style direct prompting (for testing without fine-tuning)
prompt = """
Convert the following text into JSON format according to the specified schema.
Ensure that both keys and values are strings, even for numerical values.

Text: {input_text}

Provide your response in the following JSON format: {json_schema}

Please output ONLY the JSON structure and extract the attributes only present in the schema.
Output:
"""

# Example usage
input_text = "Apple reported Q3 revenue of $89.5B, up 5% YoY. iPhone sales drove growth."
json_schema = {
    "type": "object",
    "properties": {
        "company": {"type": "string"},
        "quarter": {"type": "string"},
        "revenue_billion": {"type": "string"},
        "yoy_growth": {"type": "string"},
        "growth_driver": {"type": "string"}
    },
    "required": ["company", "quarter", "revenue_billion", "yoy_growth", "growth_driver"]
}

formatted_prompt = prompt.format(
    input_text=input_text,
    json_schema=json.dumps(json_schema)
)
# → Feed this prompt into any LLM to reproduce the basic behavior of SLOT

Terminology

SFTSupervised Fine-Tuning. Learning by studying correct examples, like practicing with model answers.

LoRAA lightweight fine-tuning technique that inserts small adapter matrices instead of retraining the whole model. Uses far less GPU memory, making 7B model training possible on consumer GPUs.

constrained decodingBlocking the LLM from selecting tokens that violate grammar or schema rules in real-time during generation.

Related Resources

Original Abstract (Expand)

Structured outputs are essential for large language models (LLMs) in critical applications like agents and information extraction. Despite their capabilities, LLMs often generate outputs that deviate from predefined schemas, significantly hampering reliable application development. We present SLOT (Structured LLM Output Transformer), a model-agnostic approach that transforms unstructured LLM outputs into precise structured formats. While existing solutions predominantly rely on constrained decoding techniques or are tightly coupled with specific models, SLOT employs a fine-tuned lightweight language model as a post-processing layer, achieving flexibility across various LLMs and schema specifications. We introduce a systematic pipeline for data curation and synthesis alongside a formal evaluation methodology that quantifies both schema accuracy and content fidelity. Our results demonstrate that fine-tuned Mistral-7B model with constrained decoding achieves near perfect schema accuracy (99.5%) and content similarity (94.0%), outperforming Claude-3.5-Sonnet by substantial margins (+25 and +20 percentage points, respectively). Notably, even compact models like Llama-3.2-1B can match or exceed the structured output capabilities of much larger proprietary models when equipped with SLOT, enabling reliable structured generation in resource-constrained environments.