<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Gnome on thelastguardian.me</title><link>https://thelastguardian.me/tags/gnome/</link><description>Recent content in Gnome on thelastguardian.me</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Sun, 26 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://thelastguardian.me/tags/gnome/index.xml" rel="self" type="application/rss+xml"/><item><title>Teaching an Agent to Click on Wayland</title><link>https://thelastguardian.me/posts/2026-04-26-screen-control-on-wayland/</link><pubDate>Sun, 26 Apr 2026 00:00:00 +0000</pubDate><guid>https://thelastguardian.me/posts/2026-04-26-screen-control-on-wayland/</guid><description>&lt;p&gt;Second half of giving the assistant hands: driving the actual screen.&lt;/p&gt;
&lt;p&gt;In the &lt;a href="https://thelastguardian.me/posts/2026-04-19-teaching-an-agent-to-use-a-browser/"&gt;last post&lt;/a&gt; I got an agent driving a browser - Selenium against a copy of my Firefox profile for anything on my own accounts, Playwright for the clean-room jobs. That didn&amp;rsquo;t cover native desktop apps with no DOM to drive, or the occasional site where only my &lt;em&gt;actual&lt;/em&gt; running browser, with my actual session, would do.&lt;/p&gt;
&lt;p&gt;Instead of controlling a browser, control the screen: move the real mouse, press real keys, against whatever window is in front of me. If the agent can operate the machine the way I do when I&amp;rsquo;m sitting at it, the browser and the native app stop being two different problems.&lt;/p&gt;
&lt;p&gt;On GNOME on Wayland, that took a lot longer than I expected.&lt;/p&gt;</description></item></channel></rss>