The L in "LLM" Stands for Lying
TL;DR Highlight
LLM-generated code and content is fundamentally 'forgery' — and we've lost something real by abandoning craftsmanship in favor of velocity.
Who Should Read
Developers wrestling with questions of craft, ownership, and quality in an era of AI-generated code, and tech ethicists thinking about what we lose when we automate creative work.
Core Mechanics
- The core argument: when LLMs generate code, they produce outputs that mimic the surface form of expert work without the underlying understanding — this is forgery in a meaningful sense.
- Craftsmanship involves not just the output but the learning process: struggling with a problem, developing intuition, building a mental model. LLM-generated code skips all of this.
- There's a distinction between 'using AI as a tool' (like using a compiler or a library) and 'using AI as a substitute for thinking' — the author argues much current LLM coding use falls into the latter.
- The velocity gains from AI code generation may be real short-term, but compound into skill atrophy and reduced understanding of systems you nominally own.
- This isn't a Luddite argument — the question is about intentionality: are you using AI to go faster on understood problems, or to avoid understanding problems?
Evidence
- The author drew on examples from their own experience of shipping AI-generated code they didn't fully understand, and the subsequent debugging costs when things broke.
- HN had a typically spirited debate — with strong voices on both sides. Senior engineers shared experiences of losing juniors who could ship features but couldn't debug or reason about systems.
- Counter-argument: craftsmanship in software has always been about outcomes, not process. Using better tools (including AI) is how craft evolves.
- Several commenters noted the parallel to calculators in math education — we made a collective decision that computational fluency was worth trading for deeper arithmetic understanding.
How to Apply
- Be intentional about when you use AI for code generation: use it for boilerplate and patterns you already understand, not for core logic you're still learning.
- After AI generates code, make it a habit to read and understand every line before merging — not just 'does it pass tests' but 'do I understand why it works.'
- For engineering leads: evaluate AI tool usage not just by velocity metrics but by whether your team's understanding and debugging capability is growing or atrophying over time.
Terminology
Related Papers
Can LLMs model real-world systems in TLA+?
LLM이 TLA+ 명세를 작성할 때 문법은 잘 통과하지만 실제 시스템과의 동작 일치도(conformance)는 46% 수준에 그친다는 걸 체계적으로 검증한 벤치마크 연구로, AI 기반 형식 검증의 현실적 한계를 보여준다.
Natural Language Autoencoders: Turning Claude's Thoughts into Text
Anthropic이 LLM 내부의 숫자 벡터(활성화값)를 직접 읽을 수 있는 자연어로 변환하는 NLA 기법을 공개했다. AI가 실제로 무슨 생각을 하는지 해석하는 interpretability 연구의 새로운 진전이다.
ProgramBench: Can language models rebuild programs from scratch?
LLM이 FFmpeg, SQLite, PHP 인터프리터 같은 실제 소프트웨어를 문서만 보고 처음부터 재구현할 수 있는지 측정하는 새 벤치마크로, 최고 모델도 전체 태스크의 3%만 95% 이상 통과하는 수준에 그쳤다.
MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents
티켓 3장으로 쪼개면 Claude/GPT도 보안 취약점 코드를 53~86% 확률로 그냥 짜준다.
Refusal in Language Models Is Mediated by a Single Direction
Open-source chat models encode safety as a single vector direction, and removing it disables safety fine-tuning.
Show HN: A new benchmark for testing LLMs for deterministic outputs
Structured Output Benchmark assesses LLM JSON handling across seven metrics, revealing performance beyond schema compliance.