HomeLab 2026 - The Two Datacenters

The last post on this blog is from January 2020. In the six years since, the lab and I both emigrated, multiplied, and developed opinions about declarative configuration. This post is the “what’s running now” tour, and the kickoff for a series I actually intend to finish this time: local LLM serving, SLOs for inference, and the automation layer that keeps it all from paging me at dinner (fine, the automation layer that is supposed to keep everything working but leaves me opening a debug shell when I really need to use it to relax).

From one rack to two countries

The 2020-era lab was a Proxmox box and a solution in search of a problem. Today it’s two sites, Mumbai and Dublin, that operate as one logical homelab:

  • Dublin is the primary site: Proxmox hosts, the NAS (ZFS, of course), the AI machine, and the Kubernetes cluster that runs most user-facing services.
  • Mumbai is the family site and the off-site replica: its own NAS, its own small k3s cluster, and the photo and media services the family actually uses daily.

The Dublin server: a 4U rack chassis open on a shelf, the R9700 build visible inside The Dublin “primary site”: a 4U chassis on a shelf in the utility nook, sharing space with the washing machine.

The two sites replicate to each other (ZFS send/receive via syncoid), monitor each other, and share one GitOps repo.

The design rule that holds the whole thing together is that the two sites are symmetric peers, not a primary and a backup. Each can monitor the other and keep working when the other is down. An off-site copy on a different continent only helps if the off-site isn’t itself a single point of failure.

The platform layer

Everything is NixOS. Every VM and physical host runs from a single flake repo, deployed with colmena for the remote hosts. Six years ago I was hand-crafting Ubuntu VMs; today a new machine is a hosts/<name>/ directory and a deploy command. The whole fleet is reproducible from git, and just as usefully, it’s diffable: I can see exactly what a change does to every host before it lands.

Everything user-facing is Kubernetes. Two k3s clusters, one per site, both managed by ArgoCD watching a single repo. Push YAML to git, ArgoCD syncs it. I stopped SSHing into things to deploy years ago and have no intention of starting again. The cluster’s desired state lives in a git history, not in my memory of what I changed at 2am. Why Kubernetes? I also used it heavily at work, and it stuck. Works For MeTM.

Observability - fail loud. Prometheus, Grafana and Alertmanager on both clusters, blackbox probes for the things Kubernetes can’t see, and an external dead-man’s switch: an Uptime Kuma instance on a webhost outside the lab that alerts if the internal monitoring stops checking in. The lab alerts me on Discord, and the cloud alerts me on Discord when the lab can’t. I’ve often been frustrated by “wait I thought this was working, when did it die?”

Storage is ZFS on both NASes with scheduled scrubs, SMART monitoring, snapshot replication between sites, and integrity-check cron jobs guarding the family photo library (Immich). The 3-2-1 backup rule may have changed meaning over the decades, but the intention remains.

What I’m actually after

I want a Jarvis. Something that handles the boring admin of my digital life, the chores I keep doing by hand. I drive it when I care to, or hand it a task and walk off. Mostly I’m lazy.

Running models locally is part of the plan, mostly so I understand how the machinery works. The posts here are the pieces, as I build them.

The copyright year in the footer is fixed, the theme is updated, and the drafts folder is still a graveyard. But this time I’ll try to keep up, and maybe even use some LLMs to help.