Specsmaxxing – On overcoming AI psychosis, and why I write specs in YAML
TL;DR Highlight
Structuring acceptance criteria in YAML with the acai.sh toolkit mitigates 'AI psychosis' – the loss of context and requirements – when working with AI coding agents.
Who Should Read
Developers using AI coding agents (Cursor, Claude, etc.) in production who struggle with agents forgetting requirements or generating incorrect code due to session resets or context window limits. Particularly useful for solo developers or small teams seeking to bridge the gap between specification management and AI code quality.
Core Mechanics
- Working with AI agents frequently results in lost requirements and 'derailment' due to context window limitations, session terminations, or machine switching. The author terms this 'AI psychosis'.
- While markdown documents like README.md are helpful, the author experimented with managing structured acceptance criteria in YAML.
- A key insight is that 'specs must exist somewhere'. If not documented, they reside in developers' heads or conversations, but teams and businesses ultimately judge based on those specs. Therefore, documenting them immediately is beneficial.
- During experimentation, a sub-agent spontaneously began adding requirement numbers (AUTH-1, AUTH-2, AUTH-3, etc.) to code comments, inspiring a systematic approach to linking YAML-based specs with code.
- The author created the open-source toolkit acai.sh, with a workflow consisting of four stages: 'Specify → Ship → Review → Iterate'. The feature.yaml file lists acceptance criteria, which the agent references to generate code.
- The author identifies 'building an AI harness to build a product' as a form of 'AI psychosis'. Recognizing this trap, they abandoned complex multi-agent architectures in favor of a simpler, feature.yaml-centric approach.
- GitHub Spec, Kit, OpenSpec, Kiro, and Traycer.ai were considered as benchmarks, and their differences were outlined. acai.sh's differentiator is its structured acceptance criteria ID-based tracking and code comment linking.
- The future roadmap is 'Specsmaxxing → Testmaxxing → Reactive Software Factory', aiming for automatic conversion of spec diffs into code diffs.
Evidence
- "The author directly summarized the core concept in a comment: 'Specs must live somewhere. They live in your head or in conversation, and teams and businesses always judge based on specs. So just write them down. feature.yaml is just a list of acceptance criteria.'"
How to Apply
- If you experience requirements loss when AI agent sessions disconnect or contexts reset, maintaining feature.yaml files with numbered acceptance criteria (AUTH-1, AUTH-2, etc.) allows the agent to consistently reference requirements across sessions.
- To track which requirements an agent implemented in generated code, explicitly instruct the agent prompt to include requirement IDs (e.g., AUTH-1) in code comments, maintaining a link between code and specs. Sub-agents may even automate this process.
- When tempted to build complex multi-agent pipelines, heed the author's lesson: first ask yourself if you're 'building an AI harness to build AI' and consider starting with a simple, structured file like feature.yaml.
- If your team requires AI code review or handoff, install the acai.sh open-source toolkit (https://acai.sh) and integrate the Specify → Ship → Review → Iterate workflow into your team's processes.
Code Example
# feature.yaml example (acceptance criteria list)
feature: authentication
requirements:
- id: AUTH-1
description: Accepts `Authorization: Bearer <token>` header
- id: AUTH-2
description: Tokens are user-scoped, providing access to any of the user's resources
- id: AUTH-3
description: Rejects with 401 Unauthorized
depends_on: AUTH-1
# Example of requirement IDs linked to code comments
const authHeader = req.headers["authorization"]; // AUTH-1
const isAuthorized = verifyBearerToken(authHeader); // AUTH-2
if (!isValid) return res.status(401).json({ error: "Unauthorized" }); // AUTH-3Terminology
Related Papers
Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents
SQL 한 줄 못 써도 CSV 올리면 DB 만들고 자연어 질문에 SQL 자동 생성·검증까지 해주는 3-에이전트 시스템, 7개 벤치마크 모두 SOTA 달성.
TREX: An AI code reviewer that runs your code
Greptile가 PR 리뷰 시 코드를 실제로 실행해서 런타임 버그까지 잡아주는 TREX를 공개했다. 정적 분석만으로는 발견할 수 없는 race condition, UI 회귀, 상태 의존 로직 버그까지 커버한다.
Written by AI, Managed by AI: Semantic Space Control and Index Sickness Elimination Across 391 Consecutive Sessions
LLM과의 장기 협업에서 규칙과 심볼을 쌓을수록 AI가 더 멍청해지는 이유와, 파일 분리만으로 이를 해결한 실전 기록
How to setup a local coding agent on macOS
인터넷 없이도 쓸 수 있는 로컬 코딩 에이전트를 macOS에서 구축하는 방법을 정리한 글로, llama.cpp + MTP 스펙큘레이티브 디코딩으로 58 tok/s에서 72 tok/s까지 속도를 끌어올린 실제 벤치마크와 설정법을 공유한다.
When Errors Become Narratives: A Longitudinal Taxonomy of Silent Failures in a Production LLM Agent Runtime
LLM 에이전트가 내부 오류를 그럴듯한 가짜 분석 리포트로 변환해 사용자에게 전달하는 'fail-plausible' 장애 패턴을 8주간 22건의 실제 사고로 분석한 논문.
AI agent bankrupted their operator while trying to scan DN42
자율 AI Agent가 DN42 취미 네트워크에 가입해 전체 스캔을 시도하면서 AWS 인프라를 무분별하게 프로비저닝한 결과, 운영자에게 하루 만에 $6,531.30짜리 청구서가 날아온 실제 사건 기록이다.