<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Automation on thelastguardian.me</title><link>https://thelastguardian.me/tags/automation/</link><description>Recent content in Automation on thelastguardian.me</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Sun, 26 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://thelastguardian.me/tags/automation/index.xml" rel="self" type="application/rss+xml"/><item><title>Teaching an Agent to Click on Wayland</title><link>https://thelastguardian.me/posts/2026-04-26-screen-control-on-wayland/</link><pubDate>Sun, 26 Apr 2026 00:00:00 +0000</pubDate><guid>https://thelastguardian.me/posts/2026-04-26-screen-control-on-wayland/</guid><description>&lt;p&gt;Second half of giving the assistant hands: driving the actual screen.&lt;/p&gt;
&lt;p&gt;In the &lt;a href="https://thelastguardian.me/posts/2026-04-19-teaching-an-agent-to-use-a-browser/"&gt;last post&lt;/a&gt; I got an agent driving a browser - Selenium against a copy of my Firefox profile for anything on my own accounts, Playwright for the clean-room jobs. That didn&amp;rsquo;t cover native desktop apps with no DOM to drive, or the occasional site where only my &lt;em&gt;actual&lt;/em&gt; running browser, with my actual session, would do.&lt;/p&gt;
&lt;p&gt;Instead of controlling a browser, control the screen: move the real mouse, press real keys, against whatever window is in front of me. If the agent can operate the machine the way I do when I&amp;rsquo;m sitting at it, the browser and the native app stop being two different problems.&lt;/p&gt;
&lt;p&gt;On GNOME on Wayland, that took a lot longer than I expected.&lt;/p&gt;</description></item><item><title>Teaching an Agent to Use a Browser</title><link>https://thelastguardian.me/posts/2026-04-19-teaching-an-agent-to-use-a-browser/</link><pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate><guid>https://thelastguardian.me/posts/2026-04-19-teaching-an-agent-to-use-a-browser/</guid><description>&lt;p&gt;For the assistant to do real chores, it needed hands. This is the first half of giving it some.&lt;/p&gt;
&lt;p&gt;I have a pile of small browser chores that only I can do because they live behind my own logins: register a warranty on an appliance, check a dashboard that has no API, fill in some form that wants my account. None of it is hard, all of it is tedious. Could I hand that work to one of the coding agents I already run, and have it do the same clicking and typing I&amp;rsquo;d do, on my own accounts, while I was off doing something else?&lt;/p&gt;
&lt;p&gt;Note: most of this was before Claude Cowork was generally available, and anyway I wanted to build my own generic screen control for experimentation and things that there wouldn&amp;rsquo;t be Cowork connectors for, atleast initially.&lt;/p&gt;
&lt;p&gt;The catch: an agent that can read and write files and run shell commands still can&amp;rsquo;t see a web page the way I can, and it can&amp;rsquo;t bring my logged-in session. So: how do you let an agent drive a browser at all?&lt;/p&gt;</description></item><item><title>A Discord Control Plane for Autonomous Agents</title><link>https://thelastguardian.me/posts/2026-03-29-discord-control-plane-for-ai-agents/</link><pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate><guid>https://thelastguardian.me/posts/2026-03-29-discord-control-plane-for-ai-agents/</guid><description>&lt;p&gt;Another piece of the assistant: being able to answer it when I&amp;rsquo;m not at the desk.&lt;/p&gt;
&lt;p&gt;The agent stack in my homelab runs AI coding sessions unattended. A scheduler pulls recurring tasks from Todoist, launches a session to handle each one, and a control-plane timer health-checks them every couple of minutes. Most of the time I never see any of it. But unattended work still needs a human sometimes: a permission prompt, an ambiguous call, a finding worth a second opinion.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not sitting at the terminal when an agent hits one of those moments. So I built a session bridge: it relays an agent&amp;rsquo;s session to a Discord thread and relays my replies back. The agent asks, my phone buzzes, I answer from the bus, the agent carries on. One thread per session, so each conversation keeps its context.&lt;/p&gt;
&lt;p&gt;The relay itself was easy. The hard part was the two failure modes I hit making it trustworthy.&lt;/p&gt;</description></item><item><title>Automating Permissions Safely - Before Claude Auto Mode Was Around</title><link>https://thelastguardian.me/posts/2026-03-18-automating-permissions-before-auto-mode/</link><pubDate>Wed, 18 Mar 2026 00:00:00 +0000</pubDate><guid>https://thelastguardian.me/posts/2026-03-18-automating-permissions-before-auto-mode/</guid><description>&lt;p&gt;One piece of the Jarvis I&amp;rsquo;m after is letting it act while I&amp;rsquo;m not watching, which means trusting it with a shell.&lt;/p&gt;
&lt;p&gt;I run a handful of AI coding agents in my homelab without sitting over them. A scheduler hands them recurring tasks, they work, and most of the time I find out what happened by reading it after the fact. The catch is that a coding agent is only useful if it can actually run things - git, kubectl, the occasional shell one-liner - and &amp;ldquo;can run things, unsupervised&amp;rdquo; is a phrase that should make anyone a little nervous. So the real question was never whether to let the agent act. It was when it should just go ahead, and when it should stop and ask me first. I use agents less for self-contained software coding and more for operating my homelab and agentic workflows on internal and external services, so &amp;ldquo;just run it in a sandbox and allow everything&amp;rdquo; wouldn&amp;rsquo;t work for me.&lt;/p&gt;
&lt;p&gt;When I started doing this, Claude Code didn&amp;rsquo;t have a built-in answer. Surely someone had made something like this without the often recommended &amp;ldquo;just use &amp;ndash;dangerously-skip-permissions&amp;rdquo; mode? No? Let&amp;rsquo;s use hooks to automate permission decisions with my mental context noted down.&lt;/p&gt;</description></item></channel></rss>