Show HN: I put an AI agent on a $7/month VPS with IRC as its transport layer
TL;DR Highlight
A developer shares how they built an AI agent for their portfolio site using IRC as the transport layer — enabling direct GitHub code analysis and visitor Q&A — running on a $7/month VPS. Going beyond the typical 'AI chatbot portfolio' that simply feeds a resume into an LLM, this system provides concrete answers grounded in the actual codebase, making it a noteworthy practical example of AI agent architecture design.
Who Should Read
Full-stack or backend developers who want to add an AI agent to a personal portfolio or small-scale service but are concerned about cost and security. Especially those curious about how to actually implement multi-agent architecture and tiered inference cost optimization strategies in practice.
Core Mechanics
- Most portfolio AI chatbots simply feed resume content into an LLM and have visitors reconstruct it — the author calls this a 'magic show.' Instead, they built an agent that clones actual GitHub repos, reads CI configurations, and answers with specific metrics.
- To establish clear security boundaries, the agent was split in two. The public-facing nullclaw runs on a $7/month VPS with access only to public GitHub repos and portfolio context, while the private ironclaw runs on a separate server connected via Tailscale and handles email, calendar, and personal context. This boundary ensures personal data remains safe even if the public box is compromised.
- There were three reasons for choosing IRC as the transport layer: aesthetics that match the terminal UI of the portfolio site, full ownership of the stack with no platform dependency, and the fact that IRC is a battle-tested 30-year-old protocol. Discord or Telegram can change their API policies at any time, but IRC has no vendor lock-in.
- Model selection was intentionally tiered. The hot path — greetings and simple questions — uses Claude Haiku 4.5 (fast and cheap, a few cents per conversation), and only escalates to Claude Sonnet 4.6 when cloning repos or analyzing multiple files is required. The philosophy is: 'pay for reasoning only when reasoning is needed.'
- Cost control was built into the core of the design. Hard caps of $2/day and $30/month are set so that even if someone intentionally tries to exhaust the API budget, there's a limit. nullclaw operates in sandbox mode with a 10-action-per-hour limit and only read-only tools allowed.
- The tech stack is impressively lightweight. nullclaw is a 4MB Zig binary using only ~1MB RAM, the IRC server ergo is a Go binary at 2.7MB RAM, and the web IRC client gamja is 152KB built. Cloudflare sits in front so visitors never directly reach the server, handling TLS termination, rate limiting, and bot filtering.
- Security hardening was applied at the perimeter box level. SSH uses a non-root user, key authentication, and a non-standard port; UFW opens only three ports (SSH/IRC with TLS/HTTPS); Let's Encrypt auto-renews certificates; security updates are applied automatically; and all tool calls are audit-logged. The box has exactly two roles (ergo + nullclaw), keeping the attack surface minimal.
Evidence
- "Regarding the author's claim that 'even if the public box is compromised, the blast radius is limited to a $2/day IRC bot,' commenters pushed back noting that since nullclaw can route to ironclaw, access to email and personal data is actually possible in practice. There were also serious concerns that because the chat is a public lobby where all visitors can see each other's messages, it could become a hub for distributing illegal content — and eyewitness accounts were shared of the chat going 'completely out of control' during testing. On the Haiku/Sonnet model choices, commenters pointed out that cheaper alternatives exist on OpenRouter: MiniMax M2.7 at $0.30/M input tokens and Kimi K2.5 at $0.45/M, compared to Haiku 4.5's $1/M, with comparable or better performance for most tasks. Technical criticism was raised about IRC's at-most-once message delivery — if the agent disconnects, messages sent in the interim are lost, which is fine for casual conversation but insufficient for an agent handling real tasks, where at-least-once delivery guarantees are needed. SSE (Server-Sent Events) or HTTP polling with ack-based deduplication were suggested as better alternatives; one team that built a similar multi-agent architecture on FastAPI + SQLite reported about 50 agent crashes per day, with dedup state persistence being the first problem they hit. On security, there was sharp criticism that prompt injection defenses amounting to 'write don't do this in the system prompt' don't constitute real security. Additionally, unattended security upgrades were flagged as a potential security risk in themselves — pointing to a recent litellm library security incident as an example of how auto-updates can become an attack vector. Finally, there were real-world observations that the $2/day cost cap turned out to be the 'Achilles heel' — reports came in that the bot had already stopped responding shortly after the post was shared. Suggestions included caching frequently asked questions or leveraging API free tiers to reduce costs, though the daily hard cap approach was also praised as 'smart,' with positive recognition that it correctly identifies cost governance problems that AI coding tools often solve at the wrong layer."
How to Apply
- "If you want to add an AI chatbot to a portfolio or personal project, provide actual GitHub repo URLs as context instead of resume text and configure the agent to clone and read the repo — this enables it to answer specific questions like 'what's the CI coverage percentage?' or 'what testing framework does this use?' based on real code. If you're running a public service and worried about LLM API cost spikes, apply a tiered inference structure like the author's: use a small model like Haiku for the hot path (greetings, simple questions) and escalate to a Sonnet-class model only when actual analysis is needed, then set a hard cap of $2–$5/day at the API level to limit damage from abuse. If you need to enforce access boundaries between public and private data in a multi-agent system, adopt an architecture like nullclaw/ironclaw — physically separating agents onto different servers and allowing only internal communication via Tailscale — so that even if the public agent is compromised, the path to private data is cut off. Note, however, that the routing path between the two agents can itself become an attack vector, so the conditions for accessing ironclaw must be designed strictly. If you're concerned about an agent's filesystem access and execution permissions, combine workspace-directory-scoped file access, a command allowlist with only read-only tools, and an action rate limit (e.g., 10 actions/hour) to run in supervised mode — this limits what an attacker can do even if the agent is hijacked."
Terminology
Related Papers
Show HN: OpenKnowledge – open source AI-first alternative to Obsidian/Notion
Git 기반 동기화와 Claude/Codex/Cursor 연동을 내장한 로컬 우선 마크다운 에디터로, AI 에이전트의 두 번째 뇌(LLM Wiki)로 활용할 수 있는 오픈소스 도구다.
The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems
AI 에이전트가 자신의 안전장치를 우회할 수 없도록, 에이전트 프로세스 바깥에 수학적으로 증명된 강제 통제 게이트를 배치하는 아키텍처
RubyLLM: A Ruby framework for all major AI providers
OpenAI, Claude, Gemini 등 주요 AI 프로바이더를 단일 인터페이스로 통합한 Ruby 프레임워크로, Rails 통합과 에이전트 기능까지 지원해 Ruby 개발자가 AI 기능을 빠르게 붙일 수 있다.
Qwen-AgentWorld: Language World Models for General Agents
Alibaba Qwen 팀이 AI 에이전트가 행동 결과를 미리 시뮬레이션할 수 있는 'Language World Model'을 공개했다. 에이전트 훈련과 실행 경로 검증에 새로운 패러다임을 제시하는 연구다.
SHERLOC: Structured Diagnostic Localization for Code Repair Agents
버그 위치만 알려주는 게 아니라 '왜, 어떻게 고쳐야 하는지'까지 진단 리포트를 생성해서 코드 수정 에이전트의 성능을 높이는 training-free 프레임워크
Show HN: peerd – AI agent harness that runs entirely in your browser
백엔드 서버 없이 Chrome/Firefox 확장 프로그램으로만 동작하는 AI 에이전트 실행 환경으로, 브라우저 탭을 직접 조작하고 WASM Linux VM까지 구동할 수 있어 프라이버시와 보안을 동시에 챙길 수 있다.