Teaching an Agent to Use a Browser

Sun, 19 Apr 2026 00:00:00 +0000

For the assistant to do real chores, it needed hands. This is the first half of giving it some.

I have a pile of small browser chores that only I can do because they live behind my own logins: register a warranty on an appliance, check a dashboard that has no API, fill in some form that wants my account. None of it is hard, all of it is tedious. Could I hand that work to one of the coding agents I already run, and have it do the same clicking and typing I’d do, on my own accounts, while I was off doing something else?

Note: most of this was before Claude Cowork was generally available, and anyway I wanted to build my own generic screen control for experimentation and things that there wouldn’t be Cowork connectors for, atleast initially.

The catch: an agent that can read and write files and run shell commands still can’t see a web page the way I can, and it can’t bring my logged-in session. So: how do you let an agent drive a browser at all?

Selenium on thelastguardian.me

Teaching an Agent to Use a Browser