Justifying Hardware Upgrades with Benchmarking

Reflecting how there hasn’t been a lot of content here, I realized one of the things that’s been keeping me from writing is the atrocious cold-start times of WordPress on my kubernetes cluster. I started by trying to optimize WordPress startup times, looking at both the standard apache module and the php-fpm image. Neither of these generated miracles, and my cold start times (as observed in Contour’s logging) were still 10+ seconds (perilously close to the 15s timeout on the Revision that I’d set).

Sitting down to actually watch the startup time on the cluster, I discovered (by running kubectl get po -w) that it was taking almost 3 seconds to get my pod scheduled. (And then it was landing on the oldest/slowest node, a 4-core AMD processor from 2013…) I had a suspicion that maybe my love of old hardware might be hobbling me in this case, so before I sprung open my wallet to buy another thin client, I decided to figure out how to benchmark my existing nodes.

Running Phoronix Test Suite on Kubernetes

I dug around a bit to try to figure out what the best benchmark / set of benchmarks to run was, and I ended up settling on the phoronix test suite, which seems to encompass a number of benchmarks and even has a packaged docker image! What luck!

A few things to note:

  • As of Sep 2021, that image is 8 months old, which suggests that the image isn’t used often.
  • I could only find a few blogposts referring to using that image, several of which referred to using Intel’s Clear Linux, but the current image is based on Ubuntu.

Since it’s clear that the world is crying out for a good way to compare the compute performance of nodes on Kubernetes, I’d figured I’d write up my journey.

phoronix/pts doesn’t install all benchmark dependencies

It took me a while to figure this out, since my first thought was just to run the container with the arguments benchmark pts/cpu, and then I was misled by the Clear Linux posts into trying to use swupd packages rather than apt packages. Since the image is 8 months old, you want to do one of two things:

  1. Run the container as a series of commands, like [/bin/bash, -c, "apt-get update && apt-get install -y build-essential && /phoronix-test-suite/phoronix-test-suite benchmark pts-cpu"].
    • This has the advantage that you’ll always be picking up the latest packages whenever you run the command, whether it’s now or 3 months from now.
    • The disadvantage is that if you want to run benchmarks on several different machines of the same architecture, you’ll end up downloading the apt repositories, build-essentials, and then building all the benchmarks N times, rather than just doing it once.
  2. Build your own image based on phoronix/pts, using a command like RUN apt-get update && apt-get install -y build-essential and then RUN /phoronix-test-suite/phoronix-test-suite install pts/cpu. You’ll then need to upload the new container to a registry.
    • The advantage of this pattern is that you only build the new container once, and don’t end up wasting CPU and bandwidth multiple times building the benchmarks.
    • The disadvantage is now you have a new container which you need to rebuild periodically.

Given the maintenance tradeoff, I ended up choosing item 1.

phoronix/pts assumes an interactive terminal

Since I wanted to run the job in an automated fashion (using a Kubernetes Job for scheduling), the interactive setup / questions needed by phoronix-test-suite were an obstacle. I might have been able to work out something with expect, but I found some tantalizing hints about being able to run batch-setup to store the answers to questions on-disk, and then use batch-benchmark to actually build and run the benchmarks themselves.

Now I just needed to figure out what the file looked like! I tried running batch-setup interactively in a named container, so I could poke at the filesystem after I’d finished running the command:

docker run --name setup-bench -it phoronix/pts /phoronix-test-suite/phoronix-test-suite batch-setup

Now I could run docker diff setup-bench to get a list of changed files, and docker export to get a tar of the entire filesystem (both the changes and the underlying layers). Combining the two, I was able to determine that batch-setup updated some files under /var/lib and /var/cache which seemed pretty ordinary, (re?) created a directory in /usr/share/phoronix-test-suite, and created a file /etc/phoronix-test-suite.xml. Poking around at that file suggested that this was what I wanted, so I copied it out of the tarball and exported it into kubernetes with:

kubectl create configmap pts-config --from-file phoronix-test-suite.xml=.\phoronix-test-suite.xml

You can see the configmap here. You can also see the job I used to run it. To limit the job to a particular class of hardware, I added model labels to all my nodes. There’s probably a nicer way to do this, but this worked for me.

Benchmarks consume a lot of CPU and RAM

The first time I ran the benchmark on my small node (Wyse Dx0Q / AMD GX-415GA), it ended up driving the loadavg above 80 and the node ended up marking itself “not ready” due to missed timers. I think the benchmark job ended up being terminated because the node was unhealthy, but I learned my lesson. Since I knew that all my nodes were 4-cpu nodes, and 3 of 4 had 8+ GB of RAM, I ended up adding CPU and memory limits (2 and 3GB) as well as requests, to ensure that I wasn’t starving out the whole node.

The Results!

You can see the results here, but the headline results look like this:

Wyse Dx0Q (AMD GX415-GA)Wyse 5070 (Celeron J4105)
sysbench14382870
overall (22 tests)100%140%-70% faster, median 100%

Since my other Wyse 5070 is a (presumably faster) Pentium J5005, I decided that it was worth upgrading, and started keeping my eye out for some more capable hardware.

My next post will be about retiring the Dx0Qs (I have two) in favor of newer hardware. One has only 4GB RAM and can come out now; the other will need to wait until the new hardware is in so I keep 3-node redundancy.

The only trick is that both nodes are part of my ceph cluster (with external 3TB USB disks). But that’s a matter for another post.

2 thoughts on “Justifying Hardware Upgrades with Benchmarking

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.