Productionizing Home Kubernetes

Since I cleaned up the physical rack layout (picture soon!), one thing that’s been bothering me about my home Kubernetes cluster has been the ad-hoc and snowflake nature of the cluster itself.

Generally, I’ve been installing/upgrading things with kubectl apply (or occasionally helm install). This is great for experimentation, but is is pretty lousy for managing things that I actually care about.

This is probably post 1 / N on the subject, but I wanted to lay out a bit of a process for migrating things from the snowflake “get something working” nature that they tend to start out to something a bit more orderly. Along the way, hopefully I can explain why things tend to start one way, and how to move them to the other.

Getting Started

When we first get started, we’re not sure exactly what we’re doing, or if whatever we’re planning to do is going to work or be worth it. I usually end up following a tutorial which is a couple years old, or maybe installing a certain tool (hi, Clear Linux!) that I haven’t used before and am not sure if I’ll like.

So, the initial effort looks a bit like taking a few notes or maybe scratching things down on a webpage somewhere so someone else can learn from my efforts. For example, it appears that a few of my Knative Deployments here are 538 days old, which probably means the original install was around Knative 0.14 (latest is 0.26).

Owning It

After a while, you end up with bunch of “stuff” on your cluster. Some of it is important (like this blog), and some of it is chaff that you never got around to cleaning up. Fortunately, Kubernetes makes it pretty easy to at least get an inventory of what you’ve got — there are namespaces and most resources boil down to either Pods or PersistentVolumes, so there are only 5-6 types that are likely to be interesting, and the “all” type is likely to cover most of them.

Still, if you have important things there, you start to think “how would I recover this if it broke”? In that case, you might turn to a tool like Velero to backup your cluster and keep track of what’s there. You might also start to check in some of your manifests, particularly the ones you wrote yourself, so that you can recover them if the worst happens.

But, this still seems pretty reactive — you’ve got some tools for cleaning up (when you get around to it) and for fixing things back to an earlier state (assuming that the backups are complete, which you can really only do by recovering from them). What we really want is something more proactive, where we can preview our changes before applying them.

Getting It Under Control

A few months back, I did a TGIK on Flux v2, which is a GitOps toolchain for managing your Kubernetes cluster. IMO, this is really where you want to be — instead of editing resources on your cluster, you’re making changes in Git (you could even peer review them via PRs/MRs) and then the Argo automation is applying the change to the cluster.

But!

There’s always a “but”…

How do you get from “mess on the cluster” to “what I intended in Git”. There are a couple different ways to do it, depending on what you’re trying to migrate:

Someone Else’s Software

If you’re managing someone else’s software (or your own software with a formal release process), you probably want to re-use the release artifacts that they’re producing upstream. This means taking a helm chart, some yaml, or what-not, and applying it to the cluster. Among other things, this will make upgrades much easier if you have the old manifests and know how you’ve changed them — you’re effectively doing a three-way merge between their old release, their new release, and your customizations.

If you’ve gone and customized the results (changed any of the cluster properties), it’s time to figure out what changes you made before you go and do an upgrade (which is what usually motivates me to figure out how to set these up). What I tend to do is download (for yaml) or generate (helm template) the original manifests and put them in my gitops directory without committing or pushing the change. Next, I run kubectl diff to get a diff between the original package and my customizations.

Sometimes, there’s a bunch of diffs that I don’t expect. This is usually a sign that I grabbed the wrong manifest, and I need to go back and figure out which was the right one. Once I’ve got the right manifest (I can explain/remember making the changes), I’ll do two commits:

  • First, a commit of the original, with a README.md explaining how the files ended up being generated / collected.
  • Next, a second commit re-applying my modifications (for example, adding resource requests and limits); possibly updating the README.md to document the changes if they are important or not obvious.

Once I have both these changes, I’ll push them together to the upstream repo. This should result in a no-op reconciliation, but I’d suggest saving away the current on-cluster resources the first few times you try it.

Your Own Software

This could be software that you wrote, or software that you packaged (for example, my WordPress-for-Knative container). This case is pretty easy, because you can your your GitOps repo as the source of truth. For this you can just

kubectl get -o yaml -n $NAMESPACE $TYPE $RESOURCE >> manifest.yaml

And then clean up the manifest (trim metadata down substantially, remove status, and possibly remove some defaulting from the spec. There’s no need to check in a “clean” copy to diff for later upgrades, because you are the one that will do the upgrade, probably a little bit at a time.

Where I’m At

A lot of this setup, including all of Knative, Rook, the MySQL operator, MetalLB, and WordPress-on-Knative are checked in at https://github.com/evankanderson/home-k8s-config. Feel free to check it out, borrow, and reference as needed.

0 thoughts on “Productionizing Home Kubernetes

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.