Type-Constrained Code Generation with Language Models
TL;DR Highlight
Enforcing TypeScript type system rules token-by-token during LLM code generation, cutting compilation errors by more than half.
Who Should Read
Developers building LLM-based code autocomplete, code translation, or code repair tools. Engineers specifically looking to reduce compile error rates in TypeScript code generation pipelines.
Core Mechanics
- 94% of LLM code generation compile errors are type errors, not syntax errors — existing syntax-based constrained decoding only catches 6%
- Combining prefix automata (ensuring partial input always stays completable) with type inhabitation search (exploring what expressions can satisfy a type) enables real-time type-aware token filtering
- Compilation errors reduced by 75.3% on HumanEval and 52.1% on MBPP (syntax-only approaches: 9% and 4.8% respectively)
- Works across code synthesis, translation, and repair tasks with consistent improvements
Evidence
- HumanEval code synthesis: average 75.3% compile error reduction; MBPP: 52.1% reduction (syntax-only: 9% and 4.8%)
- Code repair task pass@1 improved 37% on average — Gemma 2 2B improved 79.4%, CodeLlama 34B improved 56.9%
- Code synthesis pass@1 improved 3.5% on average, translation pass@1 improved across all tested models
How to Apply
- Integrate a completion engine into the token sampling step of your LLM inference loop that sets type-violating token probabilities to zero — no additional LLM calls needed.
- Apply to code translation pipelines (e.g., Python→TypeScript) to block type errors like API signature mismatches and missing arguments at generation time.
- Open-source implementation available for integration with existing LLM serving infrastructure.
Code Example
# Conceptual pseudocode: type-constrained decoding core loop
# (based on paper Algorithm 1)
def constrained_generate(llm, prompt, completion_engine):
s = "" # code generated so far
while True:
logits = llm(prompt + s) # next token probability distribution
while True:
token = sample(logits) # token sampling
if completion_engine(s + token): # check if prefix satisfies type rules
break
elif token == EOS and s in target_language:
break
else:
logits[token] = 0 # mask violating token and resample
normalize(logits)
if token == EOS:
break
s = s + token
return s
# completion_engine core: determines whether current partial code can be completed as well-typed
# → returns True if prefix automata returns a non-empty state setTerminology
Related Resources
Original Abstract (Expand)
Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model formal aspects of code. Although constrained decoding is a promising approach to alleviate this issue, it has only been applied to handle either domain-specific languages or syntactic features of general-purpose programming languages. However, LLMs frequently generate code with typing errors, which are beyond the domain of syntax and generally hard to adequately constrain. To address this challenge, we introduce a type-constrained decoding approach that leverages type systems to guide code generation. For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code. We formalize our approach on a foundational simply-typed language and extend it to TypeScript to demonstrate practicality. Our evaluation on the HumanEval and MBPP datasets shows that our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks across LLMs of various sizes and model families, including state-of-the-art open-weight models with more than 30B parameters. The results demonstrate the generality and effectiveness of our approach in constraining LLM code generation with formal rules of type systems.