Adventures with Ansible and Docker Swarm

So I’ve spent the last few months getting completely obsessed with building a proper homelab. What started as “I should just run Home Assistant on a Raspberry Pi” somehow morphed into this ridiculous project involving a mini-cluster of Raspberry Pis and me staying up until 3am troubleshooting obscure Docker networking problems. I’m not entirely sure how that escalation happened, but here we are.

In this post, I’ll walk through what I built, why I built it, and the many, many mistakes I made along the way. Fair warning - this is probably massive overkill for a home setup, but it’s been an incredibly fun learning experience. Plus, now I can control my lights with my phone, which is… basically the same as before, but with more blinking LEDs in my office.

Why I Went Down This Rabbit Hole

I’ve always had various services running at home - a Plex server here, a Pi-hole there, the occasional Minecraft server when friends were interested. But it was all very ad-hoc. Every device had its own setup, its own way of being backed up (or more realistically, not backed up), and its own special way of breaking that I’d have to rediscover each time.

After the third time I had to completely rebuild my Home Assistant setup because of an SD card failure (and lost all my automations… again), I decided enough was enough. I wanted:

  • Something that wouldn’t make me start from scratch when hardware inevitably fails
  • A way to deploy services without spending hours manually setting up each one
  • Something resembling a backup strategy that wasn’t “copy files to a USB drive when I remember to”
  • A chance to learn some of the DevOps stuff I’d been curious about but hadn’t used much

The Hardware: Pi Overload

For the hardware, I went with Raspberry Pis because:

  1. I already had a few lying around from abandoned projects
  2. They use very little power
  3. They’re small enough that my partner wouldn’t notice I was building a mini data center

The current setup includes:

  • 8 Raspberry Pi 4s (8GB)
  • 1 Raspberry Pi 5 with a 2TB NVMe drive
  • A bunch of PoE hats so I didn’t have power cables everywhere
  • An 8-port PoE switch that wasn’t cheap but saves me from having a rat’s nest of cables

I mounted everything in a 3D-printed rack that took me about four attempts to get right. The first version had the Pis packed so tightly that they were essentially cooking each other. Nothing like the smell of hot electronics at 2am to make you reconsider your design choices.

The Pi 5 with the NVMe drive acts as my storage server. I initially tried using USB drives on the Pi 4s, but the performance was painful, and it felt like I was just creating more points of failure.

For cooling, I’m using the fans on the PoE hats plus some small heatsinks. It’s probably overkill, but after having a Pi thermal throttle during an important update, I’ve become paranoid about temperatures.

The Software Stack

Ansible: Because I’m Too Lazy to Type the Same Commands 9 Times

Ansible is the backbone of this whole setup. Before this project, my experience with Ansible was basically “I think that’s the one with the playbooks?” Now I’ve become that annoying person who tries to automate everything. (My partner asked if I could automate making coffee… working on it.)

I chose Ansible because:

  • No agents needed on the Pis
  • Relatively easy to learn (though my first playbooks were horrifying)
  • Good community support for when I inevitably got stuck

My Ansible setup handles everything from the initial Pi configuration to deploying all the services. I’ve structured it with roles for different services and playbooks for different tasks. It probably has more organization than it needs, but after my first attempts where everything was in one massive file, I’ve learned to appreciate modularity.

The biggest challenge was figuring out how to handle secrets. I started with them just sitting in plaintext files (I know, I know), but quickly moved to Ansible Vault once I realized how easy it was to accidentally commit sensitive data. Nothing like almost pushing your WiFi password to GitHub to make you take security seriously.

Docker Swarm: Because Kubernetes Seemed Like Overkill (For Now)

For container orchestration, I went with Docker Swarm. I know Kubernetes is the cool kid on the block, but for a home setup with 9 Raspberry Pis, Swarm hit the sweet spot of “powerful enough” without “requires a PhD to configure.”

Swarm lets me:

  • Deploy containers across multiple Pis
  • Have services stay running even if a Pi decides to die
  • Scale things up if needed (though honestly, most home services don’t need scaling)

The learning curve was much gentler than I expected. Within a few hours, I had containers running and moving between nodes. The real challenges came later with networking, volumes, and getting Traefik set up properly.

The one thing I underestimated was how much RAM Docker would use on each Pi. My original plan included some 2GB and 4GB Pis, but I quickly found they were struggling, so I ended up standardizing on 8GB models.

NFS: Because Data Needs to Live Somewhere

For persistent storage, I set up NFS on the Pi 5 with the NVMe drive. This lets all the nodes access the same storage, which is crucial for services that need to maintain state.

Setting up NFS itself was straightforward, but getting the permissions right was a nightmare. I spent an entire weekend trying to figure out why containers couldn’t write to mounted volumes, only to discover it was an issue with user mappings between the containers and the NFS server. I’m still not sure I’ve got it 100% right, but it works now, and I’m afraid to touch it.

The performance has been better than expected. The Pi 5’s PCIe interface for the NVMe drive makes a huge difference compared to the USB-attached storage I was using before.

Traefik: The Glue That Holds It All Together

Traefik serves as the reverse proxy for all my services. It routes traffic to the right containers and handles the HTTPS certificates through Let’s Encrypt.

Getting Traefik configured correctly took several attempts. My first attempt had certificates being requested far too frequently, which almost got me rate-limited by Let’s Encrypt. The second attempt worked but exposed the dashboard to the internet (not great). The current setup uses HTTP challenge validation and keeps the dashboard internal only.

Traefik’s integration with Docker Swarm is pretty slick - it automatically discovers services based on labels, so I don’t have to manually configure routes for new services. This makes deploying new things much easier.

The Services I’m Running

At this point, I’ve got several services running reliably:

Home Assistant: The Reason This All Started

Home Assistant runs all my smart home stuff - lights, sensors, automations, etc. It’s working great in Docker Swarm, with its configuration stored on the NFS share so it persists across deployments.

The biggest challenge was getting some of the hardware integrations working in a containerized environment. Z-Wave in particular was tricky because it needed access to a USB device, which meant I had to make sure the container always ran on the same node with the Z-Wave stick attached.

Gitea: Because I Don’t Want All My Code on GitHub

Gitea gives me a place to store all my personal projects and, ironically, the code for this entire homelab setup. It’s lightweight enough to run happily on a Pi and has all the features I need.

Setting it up with Traefik and persistent storage was straightforward, though I did have to be careful with the database setup to ensure data wasn’t lost during redeployments.

Portainer: For When I’m Too Lazy to Use the CLI

Portainer gives me a nice web UI for managing the Docker Swarm cluster. While I try to do most things through Ansible for repeatability, sometimes it’s just easier to check on things through a GUI.

The challenge here was setting up the Portainer agents on all nodes and getting them to communicate correctly. The documentation wasn’t always clear, and there was some trial and error involved.

Other Stuff That Was Fun to Set Up

I’m also running:

  • Pi-hole for network-wide ad blocking
  • Uptime Kuma to monitor everything and alert me when things break
  • A bunch of utility containers for backups, monitoring, etc.

Deployment Workflow: How It All Comes Together

The deployment process now is pretty slick (after many iterations of not-slick):

  1. I make changes to the configuration in my Git repository
  2. I run the appropriate Ansible playbook
  3. Ansible handles updating configurations, creating Docker secrets, and deploying the services
  4. Traefik automatically reconfigures itself to route traffic to the new or updated services

For major changes, I test on a staging environment first (which is just a separate stack on the same Swarm), but for minor updates, I’ll often deploy directly to production. Living dangerously, I know.

The whole process is documented in the repository README, so even if I don’t touch it for months, I can come back and remember how everything works. This has already saved me several times when I needed to make changes after forgetting all the details.

Lessons Learned (The Hard Way)

1. Start Simpler Than You Think You Need To

I initially designed this super complex setup with separate networks for different types of services, elaborate backup schemes, and other unnecessary complications. I spent weeks planning it all out before realizing I was over-engineering everything.

When I finally started building, I went with a much simpler approach and added complexity only as needed. This saved so much time and frustration.

2. Backups Are More Important Than Cool Features

Early on, I lost some configuration data because I hadn’t properly set up backups yet. I was too focused on adding new services and features. Now, I have automated backups of all critical data, both to a local server and to cloud storage.

The backup system isn’t fancy - just some scripts running in a container that use restic to back up the NFS shares - but it’s reliable and has already saved me once when I accidentally deleted some important files.

3. Documentation Is Your Future Self’s Best Friend

I forced myself to document everything as I went, even when it felt tedious. This includes:

  • What each service does and why it’s configured a certain way
  • Network layout and IP allocations
  • Troubleshooting steps for common issues

This documentation has been invaluable when I’ve had to revisit parts of the setup after not touching them for weeks or months.

4. Test Recovery, Not Just Backups

I learned that having backups doesn’t mean much if you can’t restore from them. I now periodically test restoration processes to make sure they actually work.

During one test, I discovered that I wasn’t backing up some config files that were essential for a clean recovery. It was much better to find this out during a test than during an actual failure.

5. Automate From the Beginning

Anything I didn’t automate with Ansible immediately became a pain point later. Even things that seemed like one-off tasks often needed to be repeated when I rebuilt parts of the system or added new nodes.

Now, I try to automate everything, even small tasks. The initial investment of time pays off quickly.

What’s Next?

I’ve got a bunch of things I want to add or improve:

  • Better monitoring and alerting - right now it’s pretty basic
  • Some kind of CI/CD pipeline so I can test changes before deploying them
  • Maybe exploring Kubernetes as the cluster grows
  • Improving the backup strategy with more automation and validation

But the most immediate task is actually using all these services more effectively. It’s easy to get caught up in the infrastructure side and forget that the point was to have useful services running!

Final Thoughts

Building this homelab has been an incredibly educational experience. I’ve learned so much about infrastructure automation, container orchestration, networking, and Linux administration.

What started as a simple desire to run Home Assistant more reliably turned into a deep dive into modern DevOps practices. The principles I’ve learned are directly applicable to my professional work, making this hobby project unexpectedly valuable for my career.

There’s something uniquely satisfying about building a system like this from scratch and seeing it actually work. When I first started, Docker Swarm seemed intimidating, Ansible was confusing, and the idea of managing a cluster of Raspberry Pis for home automation seemed like overkill. Now, it all makes sense, and I can’t imagine going back to setting up services manually.

If you’re considering building your own homelab, I’d say go for it! But maybe start smaller than I did - you don’t need 9 Raspberry Pis right away. Start with one or two, get comfortable with the tools and concepts, and expand as your needs and interests grow.

The journey is where most of the learning happens, and the mistakes along the way are often the best teachers. My home lab is never truly “finished” - it keeps evolving as I learn new things and as my needs change. And honestly, that’s half the fun.


↤ Previous Post
Next Post ↦