It’s back!

This blog is a bit of an experiment in running WordPress on Knative on Kubernetes in my own home. As some people may have noticed, the blog has been more down than up recently, despite the theoretically sounds underpinnings. If you’re curious, here’s a brief explanation as a way of easing myself back into the writing habit.

Hardware Failures

In Justifying Hardware Upgrades with Benchmarking, I mentioned that I’d discovered that the HP T640 series had about 30% more oomph (plus the ability to use NVME drives instead of shoehorning a 2.5 inch SATA SSD into a case not designed for it). Unfortunately, it turns out that the NVME drive I purchased ran somewhat hot; hot enough, in fact, to go into thermal shutdown while Linux was running. For those not in the know, Linux does not deal well with having the root filesystem disappear while it is running, so this lead to pretty immediate kernel panic and the host going unresponsive until I unplugged it and waited about 30 minutes or so for things to cool down.

“My hardware needs to be unplugged unpredictably every 20 days” is not, in fact, something that I’m good at. I don’t have any monitoring running at the moment, and I’m not sure where I’d send the alerts if I did, so long story short, one of the three nodes in my kubernetes cluster was out of commission for at least 2 weeks, but possibly as much as 5. Now, Kubernetes is supposed to be resilient to this sort of thing, but it turns out…

Rook / Ceph Failover is Conservative

In this case, since the MySQL pod hadn’t been able to cleanly unmount the RBD disk it was using as the backing store, the CSI interface blocked detaching the existing pod from the PVC. In turn, Kubernetes wouldn’t spin up a new Pod for the StatefulSet until the PVC was detached from the existing pod/node, which wasn’t gonna happen until the new node spun up. For some reason, the MySQL controller I was using (percona) also didn’t decide to nuke the pod and work from backup, maybe because I didn’t do the right backup settings or something. I’m low-priority re-evaluating my particular MySQL controller, but since things are working again, it’s not at top of mind.

Fixing the Hardware Problem

Once I realized that this was a real problem, even when I left the top off the case, I went out and ordered one of those $15 SSD heat sinks and put it on the SSD the next time the unit overheated (gotta take advantage of that downtime). So far, it seems to be working, though I don’t have numbers from before; it’s currently about 12C below the thermal shutdown threshold after 17 days of sitting with the lid on.

This is what I ended up putting on the SSD.

Gigabit is Great, Except When It’s Consumer

Since I’m hosting the site in my basement for the grand added cost of 50W/h (about 36KWh each month, or about $5/month of electricity at current rates), it’s pretty cheap for what I get (12 cores and about 40G of RAM). Unfortunately, since I’m running it using a consumer gigabit plan, my IP address changes somewhat often and randomly.

One extra twist is that I wanted to host the site on the bare domain name (, not on a subdomain, so I need to be able to set the A record for that domain. Most dynamic DNS providers only support setting the A record on a subdomain, so I’ve been going into the DNS console and resetting the IP address by hand each time my home DNS record changes. Except… that requires that I’m checking and noticing when my home IP changes, which it turns out I don’t think about often.

I’ve finally automated this into a different problem: I’ve written a CronJob which extracts the IP address from my router via SNMP and then uses Google’s gcloud Docker image to update the DNS record for the site. This runs every minute right now, so the gap between my IP changing and the site starting to work again should be small enough that my other operational whoopsies will overwhelm my IP instability.

The real star of this is that I was able to find two existing containers that had the tools I needed (nicolaka/netshoot is amazing for network tools, and Google publishes a gcloud image) and chain them together without needing to write my own Docker image. And, of course, Kubernetes made it easy to run this every minute and put all the sensitive configuration in Secrets, so the rest of the configuration could live on public GitHub, where I can use it as an example for others.

0 thoughts on “It’s back!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.