Automating Permissions Safely - Before Claude Auto Mode Was Around

One piece of the Jarvis I’m after is letting it act while I’m not watching, which means trusting it with a shell.

I run a handful of AI coding agents in my homelab without sitting over them. A scheduler hands them recurring tasks, they work, and most of the time I find out what happened by reading it after the fact. The catch is that a coding agent is only useful if it can actually run things - git, kubectl, the occasional shell one-liner - and “can run things, unsupervised” is a phrase that should make anyone a little nervous. So the real question was never whether to let the agent act. It was when it should just go ahead, and when it should stop and ask me first. I use agents less for self-contained software coding and more for operating my homelab and agentic workflows on internal and external services, so “just run it in a sandbox and allow everything” wouldn’t work for me.

When I started doing this, Claude Code didn’t have a built-in answer. Surely someone had made something like this without the often recommended “just use –dangerously-skip-permissions” mode? No? Let’s use hooks to automate permission decisions with my mental context noted down.

A fast layer and a slow layer

What I landed on was two layers, and fought mentally and practically a lot about the split and ordering and fallthrough mechanisms.

The first layer is deliberately dumb. A lot of what an agent does is harmless and obvious: git status, kubectl get pods, ls, reading a file, systemctl status. None of that changes anything, and none of it is worth thinking hard about. So the first layer is a hook that recognises read-only verbs and waves them straight through in a millisecond or two, without asking anyone or calling anything. The overwhelming majority of an agent’s commands are this boring, and they should cost nothing to approve. Why couldn’t I use the inbuilt permissions allow rules in Claude and other CLIs? Every new combination/use ends up prompting the user once to allow-always, and Claude Code at the often didn’t surface the ‘allow-always’ option because some of them were in chained commands or had redirections or simply were infrequent yet an upfront analysis of the cli tool could have informed us that it was read-only and safe. Mostly it was because I couldn’t reliably get ‘always-allow’ working consistently in non-simple commands.

The second layer is where the judgment lives. For anything that isn’t obviously safe - a write, a delete, something it hasn’t seen before - a second hook handed the command to an LLM and asked it, against a set of rules I’d written, whether this was fine to run unattended or something I’d want to look at first. That part needed actual reasoning. “Is this rm clearing a temp file or my home directory?” needs some reasoning. “Is this removal of a backup file on my NAS safe?” needs basic context about my homelab and what I consider prod or not, some context about my log rotations, etc. “Is it safe to run “sudo apt-get update?” - on my testing VM yes, on my prod VM, no - and I can’t rely on “ssh root@IP *” in the allow rules.

Underneath both of those sat a third thing, less a layer than a tripwire: a hard-coded deny-list of commands I never wanted any model, however confident, to wave through. rm -rf /, mkfs on a real device, dd onto a block device, a force-push to main, piping a remote script straight into a shell, and so on. There are horror stories about LLMs helpfully wiping things, and while there are some real usecases I had for a session to do this (eg ‘get this driver working in my VM, recreate the VM from the iso if needed, keep going’), I wasn’t too confident that a stray wildcard allow at some other layer wouldn’t allow something risky.

Why split it that way

The reason for the split is mostly about trust and cost. A model judging every single git status is slow, it’s expensive, and worse, it normalises asking a model about things that plainly don’t need asking. A deterministic fast-path keeps the common case instant and saves the LLM’s attention for the genuinely ambiguous middle. And the hard deny-list exists because there’s a category of action where “the model was pretty sure” is not a sentence I ever want to read. There’s been an incident where I lost Immich photos to a race condition and I couldn’t reconstruct what happened in the scripts I allowed to run.

That last one - the whole point of letting an agent run unattended is that I’m not there to catch it. So the failure being designed against isn’t “the agent does something slightly wrong”, it’s “the agent does something catastrophic and irreversible while you’re asleep”. For that, a confident wrong answer is worse than no answer. A few commands should never be reachable by reasoning at all.

For now, this works. I’ve had to tune the timeouts and the glue scripts and hooks, because effectively this is calling a full TUI CLI calling an external API, and I occasionally go through the log of decisions made, and decide what to promote into my policy.md (especially if I see the denial while I’m actually at the session).

Update, April 2026. A few weeks after I made this, Claude Code shipped “auto mode” - a built-in permission classifier that does basically this, except wired into the tool instead of bolted on through hooks. My reaction: Phew, I don’t have to maintain this anymore! I moved my rules into auto mode’s config and retired the LLM hook, but kept the fast deterministic layer, because a sub-millisecond “allow” on the obviously-read-only 90% still beats a model call. It felt good to have come up with something quite similar before it was official.

Official Auto mode has apparently many many backend configs at Anthropic as they develop this feature, and sometimes things go wrong (model starts getting rate-limited transiently, model mismatch in the config and the session, some Claude Code headers interfering, etc), the whole thing began failing closed and I had to dig out this old llm-permission-policy hook until that was fixed.

Automating Permissions Safely - Before Claude Auto Mode Was Around

A fast layer and a slow layer

Why split it that way

Leave a comment below :)