Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models
TL;DR Highlight
You can boost reasoning performance to CoT levels without CoT prompts by manipulating just a single latent feature inside the model
Who Should Read
ML engineers grappling with the cost-performance tradeoff of LLM reasoning. Especially developers self-hosting open-source models (LLaMA, Qwen, Gemma) who want to cut CoT's long token costs.
Core Mechanics
- CoT prompting isn't the only way to trigger reasoning — manipulating a single internal latent feature can enter the same reasoning mode
- SAE (Sparse Autoencoder) can pinpoint reasoning-related latent features down to just a handful
- Intervening only at the first token generation step boosted LLaMA-3.1-8B GSM8K accuracy from 24.5% to 73.3%
- On larger models like LLaMA-3.3-70B, latent steering achieves similar accuracy to CoT while using ~5x fewer tokens (53 vs 268)
- Qwen's /no_think control token is also overridden by this steering — internal computation overrides prompt-level suppression
- This reasoning feature is just a 'mode entry' indicator, unrelated to answer quality (no correlation with correctness)
Evidence
- LLaMA-3.1-8B GSM8K: Direct 24.5% → Steered Direct 73.3% (with single feature #8629 intervention only)
- LLaMA-3.3-70B GSM8K: Steered Direct 88.8% vs CoT 96.1%, tokens 53 vs 268 (~80% reduction)
- Random feature steering accuracy 26.1±3.4% vs reasoning feature steering 73.3% — confirming the effect is specific to particular features
- CoT vs Direct mode distinction point-biserial correlation r=0.14, p=0.006 (statistically significant)
How to Apply
- In open-source LLM inference pipelines, load pretrained SAEs from Goodfire/GemmaScope, find features with high differential scores between CoT vs Direct prompts, and boost at α=15~25 during the first generation step to induce reasoning without CoT
- For services where token cost matters, switching from CoT to latent steering on large models like LLaMA-3.3-70B achieves similar accuracy with drastically fewer output tokens — particularly beneficial for streaming response latency
- On models like Qwen3 that suppress reasoning via /no_think tokens, use activation steering as a fallback to force reasoning mode on specific cases without prompt changes
Code Example
# Latent steering concept example using Goodfire SAE (LLaMA-3.1-8B)
# pip install goodfire transformers
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# 1. Load model (hidden states output required)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
output_hidden_states=True,
torch_dtype=torch.float16
)
# 2. Load SAE (provided by Goodfire)
# from goodfire import SAE
# sae = SAE.load("Goodfire/Llama-3.3-70B-Instruct-SAE-l50")
# 3. Register residual injection hook
REASONING_FEATURE_IDX = 8629 # LLaMA-3.1-8B reasoning feature
STEERING_LAYER = 19
STEERING_ALPHA = 15
first_step_done = False
def steering_hook(module, input, output):
global first_step_done
if first_step_done:
return output
hidden = output[0] # (batch, seq, hidden)
# SAE encode → amplify specific feature activation value → decode → add residual
# z = sae.encode(hidden[:, -1, :])
# z_steered = z.clone()
# scale = z[:, REASONING_FEATURE_IDX].abs().mean()
# z_steered[:, REASONING_FEATURE_IDX] += STEERING_ALPHA * scale
# delta = sae.decode(z_steered) - sae.decode(z)
# hidden[:, -1, :] += delta
first_step_done = True
return (hidden,) + output[1:]
# Register hook
hook = model.model.layers[STEERING_LAYER].register_forward_hook(steering_hook)
# 4. Inference with direct prompt (reasoning works even without CoT)
prompt = "Question: James runs 3 sprints of 60 meters, 3 times a week. How many meters does he run in a week?\n\nGive me the answer directly."
# model.generate(tokenizer(prompt, return_tensors='pt').input_ids, max_new_tokens=200)
hook.remove()Terminology
Related Resources
Original Abstract (Expand)
Chain-of-Thought (CoT) prompting has improved the reasoning performance of large language models (LLMs), but it remains unclear why it works and whether it is the unique mechanism for triggering reasoning in large language models. In this work, we study this question by directly analyzing and intervening on the internal representations of LLMs with Sparse Autoencoders (SAEs), identifying a small set of latent features that are causally associated with LLM reasoning behavior. Across multiple model families and reasoning benchmarks, we find that steering a single reasoning-related latent feature can substantially improve accuracy without explicit CoT prompting. For large models, latent steering achieves performance comparable to standard CoT prompting while producing more efficient outputs. We further observe that this reasoning-oriented internal state is triggered early in generation and can override prompt-level instructions that discourage explicit reasoning. Overall, our results suggest that multi-step reasoning in LLMs is supported by latent internal activations that can be externally activated, while CoT prompting is one effective, but not unique, way of activating this mechanism rather than its necessary cause.