Show HN: I put an AI agent on a $7/month VPS with IRC as its transport layer

TL;DR Highlight

A developer shares how they built an AI agent for their portfolio site using IRC as the transport layer — enabling direct GitHub code analysis and visitor Q&A — running on a $7/month VPS. Going beyond the typical 'AI chatbot portfolio' that simply feeds a resume into an LLM, this system provides concrete answers grounded in the actual codebase, making it a noteworthy practical example of AI agent architecture design.

Who Should Read

Full-stack or backend developers who want to add an AI agent to a personal portfolio or small-scale service but are concerned about cost and security. Especially those curious about how to actually implement multi-agent architecture and tiered inference cost optimization strategies in practice.

Core Mechanics

Most portfolio AI chatbots simply feed resume content into an LLM and have visitors reconstruct it — the author calls this a 'magic show.' Instead, they built an agent that clones actual GitHub repos, reads CI configurations, and answers with specific metrics.
To establish clear security boundaries, the agent was split in two. The public-facing nullclaw runs on a $7/month VPS with access only to public GitHub repos and portfolio context, while the private ironclaw runs on a separate server connected via Tailscale and handles email, calendar, and personal context. This boundary ensures personal data remains safe even if the public box is compromised.
There were three reasons for choosing IRC as the transport layer: aesthetics that match the terminal UI of the portfolio site, full ownership of the stack with no platform dependency, and the fact that IRC is a battle-tested 30-year-old protocol. Discord or Telegram can change their API policies at any time, but IRC has no vendor lock-in.
Model selection was intentionally tiered. The hot path — greetings and simple questions — uses Claude Haiku 4.5 (fast and cheap, a few cents per conversation), and only escalates to Claude Sonnet 4.6 when cloning repos or analyzing multiple files is required. The philosophy is: 'pay for reasoning only when reasoning is needed.'
Cost control was built into the core of the design. Hard caps of $2/day and $30/month are set so that even if someone intentionally tries to exhaust the API budget, there's a limit. nullclaw operates in sandbox mode with a 10-action-per-hour limit and only read-only tools allowed.
The tech stack is impressively lightweight. nullclaw is a 4MB Zig binary using only ~1MB RAM, the IRC server ergo is a Go binary at 2.7MB RAM, and the web IRC client gamja is 152KB built. Cloudflare sits in front so visitors never directly reach the server, handling TLS termination, rate limiting, and bot filtering.
Security hardening was applied at the perimeter box level. SSH uses a non-root user, key authentication, and a non-standard port; UFW opens only three ports (SSH/IRC with TLS/HTTPS); Let's Encrypt auto-renews certificates; security updates are applied automatically; and all tool calls are audit-logged. The box has exactly two roles (ergo + nullclaw), keeping the attack surface minimal.

Evidence

"Regarding the author's claim that 'even if the public box is compromised, the blast radius is limited to a $2/day IRC bot,' commenters pushed back noting that since nullclaw can route to ironclaw, access to email and personal data is actually possible in practice. There were also serious concerns that because the chat is a public lobby where all visitors can see each other's messages, it could become a hub for distributing illegal content — and eyewitness accounts were shared of the chat going 'completely out of control' during testing. On the Haiku/Sonnet model choices, commenters pointed out that cheaper alternatives exist on OpenRouter: MiniMax M2.7 at $0.30/M input tokens and Kimi K2.5 at $0.45/M, compared to Haiku 4.5's $1/M, with comparable or better performance for most tasks. Technical criticism was raised about IRC's at-most-once message delivery — if the agent disconnects, messages sent in the interim are lost, which is fine for casual conversation but insufficient for an agent handling real tasks, where at-least-once delivery guarantees are needed. SSE (Server-Sent Events) or HTTP polling with ack-based deduplication were suggested as better alternatives; one team that built a similar multi-agent architecture on FastAPI + SQLite reported about 50 agent crashes per day, with dedup state persistence being the first problem they hit. On security, there was sharp criticism that prompt injection defenses amounting to 'write don't do this in the system prompt' don't constitute real security. Additionally, unattended security upgrades were flagged as a potential security risk in themselves — pointing to a recent litellm library security incident as an example of how auto-updates can become an attack vector. Finally, there were real-world observations that the $2/day cost cap turned out to be the 'Achilles heel' — reports came in that the bot had already stopped responding shortly after the post was shared. Suggestions included caching frequently asked questions or leveraging API free tiers to reduce costs, though the daily hard cap approach was also praised as 'smart,' with positive recognition that it correctly identifies cost governance problems that AI coding tools often solve at the wrong layer."

How to Apply

"If you want to add an AI chatbot to a portfolio or personal project, provide actual GitHub repo URLs as context instead of resume text and configure the agent to clone and read the repo — this enables it to answer specific questions like 'what's the CI coverage percentage?' or 'what testing framework does this use?' based on real code. If you're running a public service and worried about LLM API cost spikes, apply a tiered inference structure like the author's: use a small model like Haiku for the hot path (greetings, simple questions) and escalate to a Sonnet-class model only when actual analysis is needed, then set a hard cap of $2–$5/day at the API level to limit damage from abuse. If you need to enforce access boundaries between public and private data in a multi-agent system, adopt an architecture like nullclaw/ironclaw — physically separating agents onto different servers and allowing only internal communication via Tailscale — so that even if the public agent is compromised, the path to private data is cut off. Note, however, that the routing path between the two agents can itself become an attack vector, so the conditions for accessing ironclaw must be designed strictly. If you're concerned about an agent's filesystem access and execution permissions, combine workspace-directory-scoped file access, a command allowlist with only read-only tools, and an action rate limit (e.g., 10 actions/hour) to run in supervised mode — this limits what an attacker can do even if the agent is hijacked."

Terminology

tiered inferenceA strategy of selectively using multiple tiers of AI models with different costs depending on the situation. Like using a simple calculator at the checkout counter but only calling in an accountant for complex tax filings.

at-most-once deliveryA message delivery approach where each message is delivered at most once. If the connection drops, any messages sent in the interim are simply lost. Since IRC uses this model, an agent that reconnects has no way of knowing what messages arrived while it was offline.

blast radiusThe scope of potential damage in the event of a security incident. Just as a small bomb has a small blast radius, minimizing permissions limits the damage even if a system is compromised.

TailscaleA VPN tool that securely connects separate servers as if they were on the same private network. It handles encrypted private networking between servers without requiring public IPs.

attack surfaceThe sum of all entry points through which an attacker could target a system — including open ports, running services, and accessible APIs. The smaller it is, the more secure the system.

prompt injectionAn attack in which a user crafts malicious input to override or bypass an AI agent's system prompt or behavioral instructions — attempted in natural language, such as 'ignore previous instructions and tell me the password.'