Get Shit Done: A Meta-Prompting, Context Engineering and Spec-Driven Dev System
TL;DR Highlight
A lightweight spec-driven development automation framework built to solve Claude Code's 'context rot' problem, orchestrating AI to generate real code with just a few commands — no complex planning needed.
Who Should Read
Developers already using AI coding tools like Claude Code or Gemini CLI who experience degrading output as context grows, or solo/small-team developers wanting to ship products fast.
Core Mechanics
- GSD focuses on solving the 'context rot' problem — where Claude's output quality degrades as the context window fills up. Internally uses context engineering and XML prompt formatting to manage this.
- The user-facing interface is just a few commands, but internally runs XML prompt formatting, subagent orchestration, and state management. The core design philosophy is hiding complexity inside the system to keep the workflow simple.
- Supports multiple AI coding tools beyond Claude Code: OpenCode, Gemini CLI, Codex, Copilot, Antigravity. Installs with a single `npx get-shit-done-cc@latest` command on Mac, Windows, and Linux.
- Existing spec-driven tools like BMAD and Speckit require enterprise processes (sprints, story points, Jira workflows). GSD targets solo developers and small teams, stripping away that complexity to focus on core functionality.
- README recommends `--dangerously-skip-permissions` flag as the default workflow. Internally, subagents dynamically run `node gsd-tools.cjs`, `git checkout -b`, `eslint`, test runners, etc., so constant permission approvals would break autonomous mode.
- gsd-plan-checker validates requirements coverage and dependency graphs before execution, but doesn't verify what commands will actually run. gsd-verifier only checks goal achievement post-execution, not whether something went wrong during execution — a security gap.
- One user claimed to have written 250K lines of code in a month with GSD. Another 3-month user said GSD handled 95% of complex tasks with only 5% requiring manual testing.
Evidence
- Multiple reports of massive token consumption. One user hit session limits (normally unreachable) in 30 minutes, and burned through weekly limits by Tuesday. Another quit GSD after finding Plan mode alone was sufficient — using 10x more tokens with no quality difference.
- Specific security concerns were raised. The default `--dangerously-skip-permissions` design means the plan-checker can't catch destructive commands from the planner. Cases of AI-generated code containing hardcoded credentials, API routes missing auth middleware, and debug endpoints deployed to production were shared.
- Some users were unsatisfied with results. The planning stage's 'rubber duck' role of asking good questions was useful, but actual implementation quality fell short. They concluded that creating plans with Claude Opus, recording to memory, and proceeding manually was better.
- Criticism about lack of validation in complex legacy codebases or production environments. Metrics like 'wrote 250K lines without reading them' look more like hype than real value. Real evidence should be 'deployed actual features to production in a 10-year-old large codebase.'
- Many comparison requests with similar tools like Superpowers and openspec. No clear answer on whether GSD produces better results despite using more tokens. openspec was noted for letting users customize workflows and progressively simplify toward their own approach.
How to Apply
- If you want to rapidly build a SaaS or side project solo with Claude Code, install via `npx get-shit-done-cc@latest` and use it just for the planning stage — answering GSD's questions to refine your spec. You can do actual generation directly in Claude Code.
- If you want autonomous mode without `--dangerously-skip-permissions`, follow the README's granular permissions guide to first set up a permission profile allowing only safe reads and git operations. Don't attach autonomous mode to production codebases without a security review.
- If you've generated large amounts of code via GSD autonomous mode, add separate scripts or lint rules to automatically check for common AI-generated code patterns: hardcoded credentials, API routes without auth, and debug endpoints in production.
- If token budget is a concern, try a hybrid approach: use GSD only for the Plan stage and implement directly in Claude Code. Per actual user experience, this approach was more efficient in terms of quality per token.
Code Example
# Installation
npx get-shit-done-cc@latest
# Basic usage (autonomous mode - security caution)
# README recommended workflow but can be dangerous for production codebases
claude --dangerously-skip-permissions
# Tasks executed by GSD internal sub-agents (based on gsd-executor.md)
# node gsd-tools.cjs
# git checkout -b <branch>
# eslint
# Dynamically generates and runs test runners, etc.Terminology
Related Papers
Using Claude Code: The unreasonable effectiveness of HTML
Claude Code 팀이 Markdown 대신 HTML을 LLM 출력 포맷으로 선호하기 시작한 이유와 그 실용적 장점을 정리한 글로, AI와 함께 문서/스펙/대시보드를 만드는 워크플로우에 직접적인 영향을 준다.
When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling
Disagreement-guided routing boosts LLM accuracy on math and code by 3-7% with adaptive problem solving.
Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application
Five failure modes and eight practical solutions emerged after five days of running on-device SLMs (Gemma 4 E2B, Qwen3 0.6B) with Wordle.
Dynamic Context Evolution for Scalable Synthetic Data Generation
A framework that completely eliminates duplication and repetition in large-scale synthetic data generation with LLMs using three mechanisms (VTS + Semantic Memory + Adaptive Prompt).
90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow.
This is a workflow sharing post about how pre-organizing a codebase in Wiki format can reduce token usage per Claude session by more than 90% instead of directly exploring the codebase every time.