A Discord Control Plane for Autonomous Agents
Another piece of the assistant: being able to answer it when I’m not at the desk.
The agent stack in my homelab runs AI coding sessions unattended. A scheduler pulls recurring tasks from Todoist, launches a session to handle each one, and a control-plane timer health-checks them every couple of minutes. Most of the time I never see any of it. But unattended work still needs a human sometimes: a permission prompt, an ambiguous call, a finding worth a second opinion.
I’m not sitting at the terminal when an agent hits one of those moments. So I built a session bridge: it relays an agent’s session to a Discord thread and relays my replies back. The agent asks, my phone buzzes, I answer from the bus, the agent carries on. One thread per session, so each conversation keeps its context.
The relay itself was easy. The hard part was the two failure modes I hit making it trustworthy.
The shape of the bridge
The sessions run under agent-deck, which orchestrates a fleet of CLI coding agents in tmux. agent-deck ships with a bridging role it calls conductor; I added conductorclaw (named in the openclaw/nanoclaw chatbot style) for the user-facing side. The two split the work:
- conductorclaw - the dispatch and relay. It posts an agent’s question into its Discord thread, watches for my reply, and feeds it back to the session.
- conductor - the decision-maker. It handles the unattended permission calls an agent makes when nobody’s watching.
A relay that misfires is harmless. A decider that says yes when it should have asked is not. So conductor gets less trust than conductorclaw.
Underneath both is the session bridge itself. A session is a tmux-hosted CLI agent; the bridge tails its output, spots the “I’m blocked” state, and mirrors it to Discord. My replies go back the other way into the session’s stdin. Both directions can fail silently.
Failure mode one: the prompt that vanished
Early on, dispatching to a fresh worker was one call: launch the session with the prompt baked in. It worked, until every so often a worker came up idle, sitting at an empty prompt with no task. No error. The launch had reported success.
It was a cold-start race. The launch path tried to deliver the prompt before the agent’s input surface had finished mounting, so the paste landed nowhere during a UI re-mount. Worse, the delivery check returned success whether or not the prompt was consumed. It polled briefly and exited 0.
The fix had two parts:
- Split launch from delivery. Launch the session with no prompt, wait for the input surface to render, then send the prompt as a separate, verified step. A worker that comes up empty is now a known state I can see, not a lie.
- Make delivery fail loud. The send path now checks that the message was consumed and returns non-zero when there’s no evidence it landed. A dropped prompt becomes an error my dispatcher can see and retry, instead of a worker idling forever.
I caught it in local dev: a freshly-launched worker sitting there empty, then read agent-deck’s own source until I found the launch path that polls briefly and exits 0 no matter what happened to the paste. I pinned down the launch-then-verified-send pattern locally before trusting the scheduler with it.
Failure mode two: shipping files back out
Text is the easy direction. The day I wanted an agent to send me a screenshot (“here’s the failing page”), I went with a marker. The agent emits a tiny token in its output, [bridge-file:/tmp/screen.png]; the bridge intercepts it on the way to Discord, attaches the file, and strips the marker from the visible text. In the terminal it’s inert text; it only means something at the boundary.
The work was everything that keeps that from being a security hole:
- A narrow allowlist. Attachable paths are scratch dirs (
/tmp,/var/tmp), not my home directory and not the source tree. Symlinks get resolved before the check, so a/tmp/xthat points at/etc/passwdis rejected. - Limits, enforced. Per-file and total-payload size caps, a maximum attachment count, and regular files only (no devices, sockets, or FIFOs).
- Failures are always visible. If a marker can’t be honored, the bridge emits a tagged reason in the thread rather than dropping it silently. (Filenames in those tags get defanged first, so a file named to ping a channel can’t.)
The bug class that unit tests can’t see
Every package in the bridge had green unit tests. Production was still broken in ways those tests missed, because the bugs lived in the wiring between packages, not inside any one of them. A serializer that round-tripped perfectly on its own, talking to a consumer that expected a slightly different shape. A retry that was perfectly correct, sitting on top of a verifier that lied (see above).
So I wrote an integration harness that exercises the real seams: a real session, a real relay, a real round-trip, with assertions on the observable end-to-end behaviour. Did the message actually arrive? Did the file actually attach? Did a forced failure actually surface a tag in the thread? It caught several bugs that had sailed straight through the passing unit tests.
Same reason I run inference SLOs and dead-man’s switches (other posts). Anything I leave running unattended has to get loud when it breaks.
It’s declarative, and agents maintain it
The bridge, the two relay identities, the scheduler, the control-plane timer - all of it is a NixOS module, built and deployed the same declarative way as the rest of my fleet. A new machine is a config directory and a deploy command, and the whole agent stack comes up identically every time.
When the packaging needs a change or a new component wired in, I mostly have Claude do it: edit the Nix module, build it, deploy it. So the agent stack is kept reproducible and deployable by a coding agent.
Why a homelab toy is the day job
The scale is small: a Discord bot for a few agents on one box. The questions are the ones you hit building any reliable distributed system. How do you deliver a message across a timing boundary and know it landed? How do you let an untrusted producer hand you a file safely? How do you test a system whose bugs live between its parts? The worst failure here is a missed agent question. At work the stakes are higher and the questions are identical.
I trust the bridge now. When my phone’s quiet the agents are getting on with it; when it buzzes, something needs me.