Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models

Jan 12, 2026•Zhenghao He, Guangzhi Xiong, Bohan Liu +2•View PDF

TL;DR Highlight

You can boost reasoning performance to CoT levels without CoT prompts by manipulating just a single latent feature inside the model

Who Should Read

ML engineers grappling with the cost-performance tradeoff of LLM reasoning. Especially developers self-hosting open-source models (LLaMA, Qwen, Gemma) who want to cut CoT's long token costs.

Core Mechanics

CoT prompting isn't the only way to trigger reasoning — manipulating a single internal latent feature can enter the same reasoning mode
SAE (Sparse Autoencoder) can pinpoint reasoning-related latent features down to just a handful
Intervening only at the first token generation step boosted LLaMA-3.1-8B GSM8K accuracy from 24.5% to 73.3%
On larger models like LLaMA-3.3-70B, latent steering achieves similar accuracy to CoT while using ~5x fewer tokens (53 vs 268)
Qwen's /no_think control token is also overridden by this steering — internal computation overrides prompt-level suppression
This reasoning feature is just a 'mode entry' indicator, unrelated to answer quality (no correlation with correctness)

Evidence

LLaMA-3.1-8B GSM8K: Direct 24.5% → Steered Direct 73.3% (with single feature #8629 intervention only)
LLaMA-3.3-70B GSM8K: Steered Direct 88.8% vs CoT 96.1%, tokens 53 vs 268 (~80% reduction)
Random feature steering accuracy 26.1±3.4% vs reasoning feature steering 73.3% — confirming the effect is specific to particular features
CoT vs Direct mode distinction point-biserial correlation r=0.14, p=0.006 (statistically significant)

How to Apply

In open-source LLM inference pipelines, load pretrained SAEs from Goodfire/GemmaScope, find features with high differential scores between CoT vs Direct prompts, and boost at α=15~25 during the first generation step to induce reasoning without CoT
For services where token cost matters, switching from CoT to latent steering on large models like LLaMA-3.3-70B achieves similar accuracy with drastically fewer output tokens — particularly beneficial for streaming response latency
On models like Qwen3 that suppress reasoning via /no_think tokens, use activation steering as a fallback to force reasoning mode on specific cases without prompt changes

Code Example

snippet

# Latent steering concept example using Goodfire SAE (LLaMA-3.1-8B)
# pip install goodfire transformers

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# 1. Load model (hidden states output required)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    output_hidden_states=True,
    torch_dtype=torch.float16
)

# 2. Load SAE (provided by Goodfire)
# from goodfire import SAE
# sae = SAE.load("Goodfire/Llama-3.3-70B-Instruct-SAE-l50")

# 3. Register residual injection hook
REASONING_FEATURE_IDX = 8629  # LLaMA-3.1-8B reasoning feature
STEERING_LAYER = 19
STEERING_ALPHA = 15
first_step_done = False

def steering_hook(module, input, output):
    global first_step_done
    if first_step_done:
        return output
    
    hidden = output[0]  # (batch, seq, hidden)
    
    # SAE encode → amplify specific feature activation value → decode → add residual
    # z = sae.encode(hidden[:, -1, :])
    # z_steered = z.clone()
    # scale = z[:, REASONING_FEATURE_IDX].abs().mean()
    # z_steered[:, REASONING_FEATURE_IDX] += STEERING_ALPHA * scale
    # delta = sae.decode(z_steered) - sae.decode(z)
    # hidden[:, -1, :] += delta
    
    first_step_done = True
    return (hidden,) + output[1:]

# Register hook
hook = model.model.layers[STEERING_LAYER].register_forward_hook(steering_hook)

# 4. Inference with direct prompt (reasoning works even without CoT)
prompt = "Question: James runs 3 sprints of 60 meters, 3 times a week. How many meters does he run in a week?\n\nGive me the answer directly."
# model.generate(tokenizer(prompt, return_tensors='pt').input_ids, max_new_tokens=200)

hook.remove()

Terminology

SAEA tool that decomposes the model's tangled internal activation signals into meaningful sparse components. Like a filter that isolates specific frequencies from radio signals, it can extract 'reasoning-related signals' from inside the model.

Latent FeatureA hidden dimension in the model's internal representation space associated with specific behaviors or concepts. An invisible internal switch that determines model behavior.

Latent SteeringA technique that manipulates model internal activation values directly to induce desired behavior without changing prompts. Like a cheat code that modifies memory directly instead of the app UI.

Residual InjectionInstead of overwriting the original activations with SAE-modified values, only adding the 'delta' (difference). Like transplanting rather than replacing the whole organ, minimizing side effects.

CoTShort for Chain-of-Thought. A prompting technique that makes the model output intermediate reasoning steps, like 'think step by step.' Similar to showing your work when solving math problems.

Mechanistic InterpretabilityA research field that tries to dissect why LLMs produce specific outputs at the internal mechanism level. Like doing an MRI on a black-box model to look inside.

Point-biserial CorrelationA statistical method measuring the correlation between a continuous variable (feature activation values) and a binary variable (CoT or not). A standardized measure of the mean difference between two groups.

Related Resources

Original Abstract (Expand)

Chain-of-Thought (CoT) prompting has improved the reasoning performance of large language models (LLMs), but it remains unclear why it works and whether it is the unique mechanism for triggering reasoning in large language models. In this work, we study this question by directly analyzing and intervening on the internal representations of LLMs with Sparse Autoencoders (SAEs), identifying a small set of latent features that are causally associated with LLM reasoning behavior. Across multiple model families and reasoning benchmarks, we find that steering a single reasoning-related latent feature can substantially improve accuracy without explicit CoT prompting. For large models, latent steering achieves performance comparable to standard CoT prompting while producing more efficient outputs. We further observe that this reasoning-oriented internal state is triggered early in generation and can override prompt-level instructions that discourage explicit reasoning. Overall, our results suggest that multi-step reasoning in LLMs is supported by latent internal activations that can be externally activated, while CoT prompting is one effective, but not unique, way of activating this mechanism rather than its necessary cause.