Letting AI play my game – building an agentic test harness to help play-testing
TL;DR Highlight
IndieGameAgent automatically playtests games using an LLM, solving a QA bottleneck for solo developers.
Who Should Read
Solo indie game developers, or those building applications with text-based interfaces, seeking to automate testing environments with AI agents.
Core Mechanics
- Despite the original post being inaccessible due to Vercel security restrictions, community comments confirm the author built an 'agentic test harness' where an LLM directly plays and tests the game.
- The core idea involves a separate text-only renderer that converts game state into text, allowing the LLM to understand the game without 'seeing' the screen visually.
- This text renderer approach is praised as an ingenious design, circumventing the 'visual grounding problem' where AI must analyze screenshots or DOMs.
- The architecture leverages MCP (Model Context Protocol) to enable the agent to directly access and manipulate the game's actual state.
- This approach mirrors E2E testing, but with an LLM agent as the tester, uncovering unexpected bugs and game balance issues without pre-defined scripts.
- The community shared that starting the agent with only CLI usage instructions—without prior game context—provides a fresh perspective akin to 'rubber-ducking' debugging.
- Real-world experience shows that using agents enables a workflow where features are implemented and E2E tests are self-verified while the developer is away.
Evidence
- "Some suggested using a Monte Carlo headless simulator instead of an LLM, citing speed and cost advantages for deterministic games with parallelizable simulations. A developer testing AI on a real-time physics-based 2D game found browser MCP impractical due to objects flying off-screen before AI could capture screenshots, opting for a hybrid API. An E2E web test user shared a token optimization tip: switching from raw DOM to accessibility-tree references reduced token usage tenfold and improved agent accuracy. Another user found that providing agents with both source code and live browser snapshots simultaneously maximized test quality, avoiding false positives from code-only or browser-only approaches. A user connecting an MCP server to a MUD saw Claude Code agents collaboratively building new sections in separate windows, while a team introducing agents to a Pokémon-style MMORPG received negative feedback—'I won't waste precious tokens playing a game'."
How to Apply
- "If building a text-based or turn-based game, completely separate game logic and rendering, creating a dedicated renderer to serialize game state into text. This simplifies building an agentic test harness by eliminating visual processing requirements. For non-real-time, deterministic games, consider a Monte Carlo simulation instead of costly LLMs for faster, more efficient balance tuning. To reduce token costs in LLM-based testing, provide structured text—like accessibility-tree references or key state values—instead of raw browser or game state. If you want the agent to self-verify implementations, instruct it to 'write E2E tests and confirm with screenshots' during code generation, enabling autonomous implementation-verification loops."
Code Example
// Example architecture pattern mentioned in the community
// 1. Separate renderer to serialize game state to text
function textRenderer(gameState) {
return [
`Turn: ${gameState.turn}`,
`Player HP: ${gameState.player.hp}/${gameState.player.maxHp}`,
`Location: ${gameState.currentRoom.name}`,
`Available actions: ${gameState.availableActions.join(', ')}`,
`Inventory: ${gameState.player.inventory.map(i => i.name).join(', ')}`,
].join('\n');
}
// 2. in-process MCP server pattern (ECS/Fargate environment without stdio process boundaries)
// create_sdk_mcp_server + @tool decorator style
// Maintain browser handle within tool definition scope
// 3. Token saving with accessibility-tree based references
// raw DOM (token waste):
// <div id="enemy-hp-bar" class="hp-bar" data-value="80" ...>
// accessibility-tree reference (token saving):
// e1: [button] "Attack" e2: [button] "Flee" e3: [text] "Enemy HP: 80/100"Terminology
Related Papers
How to setup a local coding agent on macOS
인터넷 없이도 쓸 수 있는 로컬 코딩 에이전트를 macOS에서 구축하는 방법을 정리한 글로, llama.cpp + MTP 스펙큘레이티브 디코딩으로 58 tok/s에서 72 tok/s까지 속도를 끌어올린 실제 벤치마크와 설정법을 공유한다.
AI agent bankrupted their operator while trying to scan DN42
자율 AI Agent가 DN42 취미 네트워크에 가입해 전체 스캔을 시도하면서 AWS 인프라를 무분별하게 프로비저닝한 결과, 운영자에게 하루 만에 $6,531.30짜리 청구서가 날아온 실제 사건 기록이다.
HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents
여러 MCP 툴 호출을 코드 블록 하나로 묶어 LLM 에이전트의 컨텍스트 낭비와 추론 단절을 동시에 해결하는 기법
EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery
LLM 에이전트에게 복잡한 워크플로우 대신 잘 설계된 '환경'을 줬더니 수학·커널·ML 벤치마크에서 모두 SOTA를 달성했다.
Ask HN: How do you get into a flow state when using AI to code?
Claude 같은 에이전트 기반 AI 코딩 도구가 보편화되면서 개발자들이 기존의 몰입 상태(flow state)를 잃어버리고 있다는 문제를 공유하고, 커뮤니티에서 각자의 대처 방법을 논의한 스레드.
Claude Desktop spawns 1.8 GB Hyper-V VM on every launch, even for chat-only use
Claude Desktop Windows 앱이 사용자가 AI 코드 실행 기능(Cowork)을 쓰지 않아도 실행 시마다 자동으로 1.8GB짜리 Hyper-V 가상머신을 생성해 메모리를 잡아먹는 버그가 보고됐다.