<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Selenium on thelastguardian.me</title><link>https://thelastguardian.me/tags/selenium/</link><description>Recent content in Selenium on thelastguardian.me</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Sun, 19 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://thelastguardian.me/tags/selenium/index.xml" rel="self" type="application/rss+xml"/><item><title>Teaching an Agent to Use a Browser</title><link>https://thelastguardian.me/posts/2026-04-19-teaching-an-agent-to-use-a-browser/</link><pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate><guid>https://thelastguardian.me/posts/2026-04-19-teaching-an-agent-to-use-a-browser/</guid><description>&lt;p&gt;For the assistant to do real chores, it needed hands. This is the first half of giving it some.&lt;/p&gt;
&lt;p&gt;I have a pile of small browser chores that only I can do because they live behind my own logins: register a warranty on an appliance, check a dashboard that has no API, fill in some form that wants my account. None of it is hard, all of it is tedious. Could I hand that work to one of the coding agents I already run, and have it do the same clicking and typing I&amp;rsquo;d do, on my own accounts, while I was off doing something else?&lt;/p&gt;
&lt;p&gt;Note: most of this was before Claude Cowork was generally available, and anyway I wanted to build my own generic screen control for experimentation and things that there wouldn&amp;rsquo;t be Cowork connectors for, atleast initially.&lt;/p&gt;
&lt;p&gt;The catch: an agent that can read and write files and run shell commands still can&amp;rsquo;t see a web page the way I can, and it can&amp;rsquo;t bring my logged-in session. So: how do you let an agent drive a browser at all?&lt;/p&gt;</description></item></channel></rss>