Get Shit Done: A Meta-Prompting, Context Engineering and Spec-Driven Dev System
TL;DR Highlight
A lightweight spec-driven development automation framework built to solve Claude Code's 'context rot' problem, orchestrating AI to generate real code with just a few commands — no complex planning needed.
Who Should Read
Developers already using AI coding tools like Claude Code or Gemini CLI who experience degrading output as context grows, or solo/small-team developers wanting to ship products fast.
Core Mechanics
- GSD focuses on solving the 'context rot' problem — where Claude's output quality degrades as the context window fills up. Internally uses context engineering and XML prompt formatting to manage this.
- The user-facing interface is just a few commands, but internally runs XML prompt formatting, subagent orchestration, and state management. The core design philosophy is hiding complexity inside the system to keep the workflow simple.
- Supports multiple AI coding tools beyond Claude Code: OpenCode, Gemini CLI, Codex, Copilot, Antigravity. Installs with a single `npx get-shit-done-cc@latest` command on Mac, Windows, and Linux.
- Existing spec-driven tools like BMAD and Speckit require enterprise processes (sprints, story points, Jira workflows). GSD targets solo developers and small teams, stripping away that complexity to focus on core functionality.
- README recommends `--dangerously-skip-permissions` flag as the default workflow. Internally, subagents dynamically run `node gsd-tools.cjs`, `git checkout -b`, `eslint`, test runners, etc., so constant permission approvals would break autonomous mode.
- gsd-plan-checker validates requirements coverage and dependency graphs before execution, but doesn't verify what commands will actually run. gsd-verifier only checks goal achievement post-execution, not whether something went wrong during execution — a security gap.
- One user claimed to have written 250K lines of code in a month with GSD. Another 3-month user said GSD handled 95% of complex tasks with only 5% requiring manual testing.
Evidence
- Multiple reports of massive token consumption. One user hit session limits (normally unreachable) in 30 minutes, and burned through weekly limits by Tuesday. Another quit GSD after finding Plan mode alone was sufficient — using 10x more tokens with no quality difference.
- Specific security concerns were raised. The default `--dangerously-skip-permissions` design means the plan-checker can't catch destructive commands from the planner. Cases of AI-generated code containing hardcoded credentials, API routes missing auth middleware, and debug endpoints deployed to production were shared.
- Some users were unsatisfied with results. The planning stage's 'rubber duck' role of asking good questions was useful, but actual implementation quality fell short. They concluded that creating plans with Claude Opus, recording to memory, and proceeding manually was better.
- Criticism about lack of validation in complex legacy codebases or production environments. Metrics like 'wrote 250K lines without reading them' look more like hype than real value. Real evidence should be 'deployed actual features to production in a 10-year-old large codebase.'
- Many comparison requests with similar tools like Superpowers and openspec. No clear answer on whether GSD produces better results despite using more tokens. openspec was noted for letting users customize workflows and progressively simplify toward their own approach.
How to Apply
- If you want to rapidly build a SaaS or side project solo with Claude Code, install via `npx get-shit-done-cc@latest` and use it just for the planning stage — answering GSD's questions to refine your spec. You can do actual generation directly in Claude Code.
- If you want autonomous mode without `--dangerously-skip-permissions`, follow the README's granular permissions guide to first set up a permission profile allowing only safe reads and git operations. Don't attach autonomous mode to production codebases without a security review.
- If you've generated large amounts of code via GSD autonomous mode, add separate scripts or lint rules to automatically check for common AI-generated code patterns: hardcoded credentials, API routes without auth, and debug endpoints in production.
- If token budget is a concern, try a hybrid approach: use GSD only for the Plan stage and implement directly in Claude Code. Per actual user experience, this approach was more efficient in terms of quality per token.
Code Example
# Installation
npx get-shit-done-cc@latest
# Basic usage (autonomous mode - security caution)
# README recommended workflow but can be dangerous for production codebases
claude --dangerously-skip-permissions
# Tasks executed by GSD internal sub-agents (based on gsd-executor.md)
# node gsd-tools.cjs
# git checkout -b <branch>
# eslint
# Dynamically generates and runs test runners, etc.Terminology
Related Papers
Multilingual Reasoning Cascades Need More Context
번역 cascade 파이프라인에서 원본 질문을 마지막까지 유지하면 추가 학습 없이 다국어 성능이 크게 오른다.
Less Back-and-Forth: A Comparative Study of Structured Prompting
체크리스트 형식으로 프롬프트를 구조화하면 LLM 답변 품질도 높아지고 토큰도 적게 쓴다.
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement
재학습 없이 각 나라의 도덕적 가치관에 맞게 LLM 출력을 조정하는 추론 시점 기법 DISCA 제안
Using Claude Code: The unreasonable effectiveness of HTML
Claude Code 팀이 Markdown 대신 HTML을 LLM 출력 포맷으로 선호하기 시작한 이유와 그 실용적 장점을 정리한 글로, AI와 함께 문서/스펙/대시보드를 만드는 워크플로우에 직접적인 영향을 준다.
When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling
Disagreement-guided routing boosts LLM accuracy on math and code by 3-7% with adaptive problem solving.
Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application
Five failure modes and eight practical solutions emerged after five days of running on-device SLMs (Gemma 4 E2B, Qwen3 0.6B) with Wordle.