I GOT AN AMD R9700 (32GB) FOR LOCAL INFERENCE

If a local model was going to be part of the assistant, I wanted to understand the machine under it. So I bought a AI-oriented GPU. I wanted to learn the internals of LLMs and inference, and I can’t afford the multi-million-dollar racks of datacenter GPUs the hyperscalers run. A local model is a temporary stand-in for that learning: not smart enough to replace a frontier model, not some AGI/ASI waking up in my utility room, but decent enough to test chatting flows and scripts and watch how serving behaves.

TEACHING AN AGENT TO CLICK ON WAYLAND

Second half of giving the assistant hands: driving the actual screen. In the last post I got an agent driving a browser - Selenium against a copy of my Firefox profile for anything on my own accounts, Playwright for the clean-room jobs. That didn’t cover native desktop apps with no DOM to drive, or the occasional site where only my actual running browser, with my actual session, would do. Instead of controlling a browser, control the screen: move the real mouse, press real keys, against whatever window is in front of me. If the agent can operate the machine the way I do when I’m sitting at it, the browser and the native app stop being two different problems. On GNOME on Wayland, that took a lot longer than I expected.

TEACHING AN AGENT TO USE A BROWSER

For the assistant to do real chores, it needed hands. This is the first half of giving it some. I have a pile of small browser chores that only I can do because they live behind my own logins: register a warranty on an appliance, check a dashboard that has no API, fill in some form that wants my account. None of it is hard, all of it is tedious. Could I hand that work to one of the coding agents I already run, and have it do the same clicking and typing I’d do, on my own accounts, while I was off doing something else? Note: most of this was before Claude Cowork was generally available, and anyway I wanted to build my own generic screen control for experimentation and things that there wouldn’t be Cowork connectors for, atleast initially. The catch: an agent that can read and write files and run shell commands still can’t see a web page the way I can, and it can’t bring my logged-in session. So: how do you let an agent drive a browser at all?

MEMORY THAT OUTLIVES THE CONTEXT WINDOW

An assistant that forgets everything between chats can’t be trusted with anything ongoing. I run a fleet of AI coding agents in my homelab, and for a while they all had the same flaw: every session, they forgot everything. You’d tell an agent “it’s haven, not the NUC, the NUC was retired” and it would nod along, fix it, and then three days later a fresh session would confidently SSH into a machine that no longer existed. Context windows are big now, but they’re not forever, and the moment a long session compacts or a new one starts, all that context is gone.

A DISCORD CONTROL PLANE FOR AUTONOMOUS AGENTS

Another piece of the assistant: being able to answer it when I’m not at the desk. The agent stack in my homelab runs AI coding sessions unattended. A scheduler pulls recurring tasks from Todoist, launches a session to handle each one, and a control-plane timer health-checks them every couple of minutes. Most of the time I never see any of it. But unattended work still needs a human sometimes: a permission prompt, an ambiguous call, a finding worth a second opinion. I’m not sitting at the terminal when an agent hits one of those moments. So I built a session bridge: it relays an agent’s session to a Discord thread and relays my replies back. The agent asks, my phone buzzes, I answer from the bus, the agent carries on. One thread per session, so each conversation keeps its context. The relay itself was easy. The hard part was the two failure modes I hit making it trustworthy.

AUTOMATING PERMISSIONS SAFELY - BEFORE CLAUDE AUTO MODE WAS AROUND

One piece of the Jarvis I’m after is letting it act while I’m not watching, which means trusting it with a shell. I run a handful of AI coding agents in my homelab without sitting over them. A scheduler hands them recurring tasks, they work, and most of the time I find out what happened by reading it after the fact. The catch is that a coding agent is only useful if it can actually run things - git, kubectl, the occasional shell one-liner - and “can run things, unsupervised” is a phrase that should make anyone a little nervous. So the real question was never whether to let the agent act. It was when it should just go ahead, and when it should stop and ask me first. I use agents less for self-contained software coding and more for operating my homelab and agentic workflows on internal and external services, so “just run it in a sandbox and allow everything” wouldn’t work for me. When I started doing this, Claude Code didn’t have a built-in answer. Surely someone had made something like this without the often recommended “just use –dangerously-skip-permissions” mode? No? Let’s use hooks to automate permission decisions with my mental context noted down.

HOMELAB 2026 - THE TWO DATACENTERS

Six years since the last post. The lab grew up - two sites on two continents, two Kubernetes clusters, GitOps everywhere, NixOS everywhere, and a GPU that talks back. A tour of what runs today and why.

VFIO ON 2ND-GEN RYZEN - PASSING A GPU, USB AND AUDIO INTO A VM (AND THE RESET BUG THAT BROKE IT)

Passing the GPU, onboard USB controllers and HD audio from my 2700X / Crosshair VII Hero into a VM. IOMMU groups, the ACS override, and a BIOS-plus-kernel regression that left devices stuck until a host reboot.

COMPILING A CUSTOM KERNEL ON PROXMOX

Building a patched pve-kernel for my Proxmox box, plus the two dumb traps that cost me an afternoon - a submodule that wouldn’t fetch and a version string that kept growing a “+”.

FIRST ATTEMPT AT CEPH RBD ON KUBERNETES IN LXC CONTAINERS

Giving my LXC-based K8s cluster dynamic persistent storage from Ceph RBD, and hitting the fact that you can’t load kernel modules inside a container.