Asynchronous Garbage Collection on a Distributed Web Cache

These are notes on improving the efficiency of the garbage collection mechanism on my ceph-backed web object cache.

This service was originally written so that garbage collection would simply happen on every ‘miss’ and subsequent object insertion.

This isn’t ideal behavior in a distributed replicaset. If the cache is full:

bash-5.2# for i in $(seq 1 110); do touch $i ; done

And a page with many images is loaded:

imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite cache miss
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite cache miss
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite cache miss
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite cache miss
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite cache miss
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite cache miss
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite cache miss
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite cache miss
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite cache miss
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/10 pruned
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/9 pruned
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/8 pruned
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/7 pruned
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/6 pruned
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/5 pruned
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/10 pruning failed: [Errno 2] No such file or directory: 'artifacts/10'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/4 pruned
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/3 prunedartifacts/9 pruning failed: [Errno 2] No such file or directory: 'artifacts/9'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/8 pruning failed: [Errno 2] No such file or directory: 'artifacts/8'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/2 pruned
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/1 prunedartifacts/10 pruning failed: [Errno 2] No such file or directory: 'artifacts/10'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/7 pruning failed: [Errno 2] No such file or directory: 'artifacts/7'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/5 pruning failed: [Errno 2] No such file or directory: 'artifacts/5'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/4 pruning failed: [Errno 2] No such file or directory: 'artifacts/4'artifacts/9 pruning failed: [Errno 2] No such file or directory: 'artifacts/9'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/8 pruning failed: [Errno 2] No such file or directory: 'artifacts/8'artifacts/10 pruning failed: [Errno 2] No such file or directory: 'artifacts/10'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/5 pruning failed: [Errno 2] No such file or directory: 'artifacts/5'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/9 pruning failed: [Errno 2] No such file or directory: 'artifacts/9'artifacts/10 pruning failed: [Errno 2] No such file or directory: 'artifacts/10'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/9 pruning failed: [Errno 2] No such file or directory: 'artifacts/9'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/8 pruning failed: [Errno 2] No such file or directory: 'artifacts/8'artifacts/10 pruning failed: [Errno 2] No such file or directory: 'artifacts/10'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/3 pruning failed: [Errno 2] No such file or directory: 'artifacts/3'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/4 pruning failed: [Errno 2] No such file or directory: 'artifacts/4'artifacts/9 pruning failed: [Errno 2] No such file or directory: 'artifacts/9'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/8 pruning failed: [Errno 2] No such file or directory: 'artifacts/8'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/4 pruning failed: [Errno 2] No such file or directory: 'artifacts/4'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/3 pruning failed: [Errno 2] No such file or directory: 'artifacts/3'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/2 pruning failed: [Errno 2] No such file or directory: 'artifacts/2'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/8 pruning failed: [Errno 2] No such file or directory: 'artifacts/8'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/5 pruning failed: [Errno 2] No such file or directory: 'artifacts/5'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/1 pruning failed: [Errno 2] No such file or directory: 'artifacts/1'artifacts/10 pruning failed: [Errno 2] No such file or directory: 'artifacts/10'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/2 pruning failed: [Errno 2] No such file or directory: 'artifacts/2'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/1 pruning failed: [Errno 2] No such file or directory: 'artifacts/1'artifacts/5 pruning failed: [Errno 2] No such file or directory: 'artifacts/5'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/4 pruning failed: [Errno 2] No such file or directory: 'artifacts/4'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/9 pruning failed: [Errno 2] No such file or directory: 'artifacts/9'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/3 pruning failed: [Errno 2] No such file or directory: 'artifacts/3'artifacts/10 pruning failed: [Errno 2] No such file or directory: 'artifacts/10'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/9 pruning failed: [Errno 2] No such file or directory: 'artifacts/9'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/8 pruning failed: [Errno 2] No such file or directory: 'artifacts/8'artifacts/8 pruning failed: [Errno 2] No such file or directory: 'artifacts/8'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/4 pruning failed: [Errno 2] No such file or directory: 'artifacts/4'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/4 pruning failed: [Errno 2] No such file or directory: 'artifacts/4'artifacts/3 pruning failed: [Errno 2] No such file or directory: 'artifacts/3'artifacts/9 pruning failed: [Errno 2] No such file or directory: 'artifacts/9'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/2 pruning failed: [Errno 2] No such file or directory: 'artifacts/2'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/2 pruning failed: [Errno 2] No such file or directory: 'artifacts/2'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/1 pruning failed: [Errno 2] No such file or directory: 'artifacts/1'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/3 pruning failed: [Errno 2] No such file or directory: 'artifacts/3'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/2 pruning failed: [Errno 2] No such file or directory: 'artifacts/2'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/1 pruning failed: [Errno 2] No such file or directory: 'artifacts/1'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/1 pruning failed: [Errno 2] No such file or directory: 'artifacts/1'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/4 pruning failed: [Errno 2] No such file or directory: 'artifacts/4'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/8 pruning failed: [Errno 2] No such file or directory: 'artifacts/8'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/4 pruning failed: [Errno 2] No such file or directory: 'artifacts/4'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/3 pruning failed: [Errno 2] No such file or directory: 'artifacts/3'artifacts/3 pruning failed: [Errno 2] No such file or directory: 'artifacts/3'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/2 pruning failed: [Errno 2] No such file or directory: 'artifacts/2'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/1 pruning failed: [Errno 2] No such file or directory: 'artifacts/1'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/3 pruning failed: [Errno 2] No such file or directory: 'artifacts/3'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/2 pruning failed: [Errno 2] No such file or directory: 'artifacts/2'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/1 pruning failed: [Errno 2] No such file or directory: 'artifacts/1'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/2 pruning failed: [Errno 2] No such file or directory: 'artifacts/2'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite artifacts/1 pruning failed: [Errno 2] No such file or directory: 'artifacts/1'
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite converting took 0.9333322048187256converting took 0.7964925765991211
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:10] "GET /?img=berome_moore_2024/IMG20241109124042.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:10] "GET /?img=berome_moore_2024/IMG20241109092818.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite converting took 0.9000723361968994
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:10] "GET /?img=berome_moore_2024/IMG20241109101357.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite converting took 1.2009005546569824
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:11] "GET /?img=berome_moore_2024/IMG20241109085706.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 0.38234400749206543
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:11] "GET /?img=berome_moore_2024/IMG20241109105400.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite converting took 0.8583321571350098
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:11] "GET /?img=berome_moore_2024/IMG20241109090525.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 0.8208229541778564
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite converting took 1.0371794700622559
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:11] "GET /?img=berome_moore_2024/IMG20241109091051.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:11] "GET /?img=berome_moore_2024/IMG20241109122022.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 1.0682506561279297
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:11] "GET /?img=berome_moore_2024/IMG20241109091009.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite converting took 0.9330615997314453converting took 0.7234072685241699
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:11] "GET /?img=berome_moore_2024/IMG20241109121752.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:11] "GET /?img=berome_moore_2024/IMG20241109110812.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite converting took 1.0442266464233398
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:12] "GET /?img=berome_moore_2024/IMG20241109101404.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 1.2557346820831299
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:12] "GET /?img=berome_moore_2024/IMG20241109085742.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 1.0972485542297363
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:12] "GET /?img=berome_moore_2024/IMG20241109110817.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 1.2005786895751953
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:12] "GET /?img=berome_moore_2024/IMG20241109121747.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 0.9915590286254883
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:12] "GET /?img=berome_moore_2024/IMG20241109124101.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 1.2299556732177734
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 1.3484768867492676
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:13] "GET /?img=berome_moore_2024/IMG20241109094645.jpg&q=25 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.2.244 - - [16/Nov/2024 23:24:13] "GET /?img=berome_moore_2024/IMG20241109092525.jpg&q=25 HTTP/1.1" 200 -

…the result is that every thread for every seperate request tries to prune the ten or so oldest files. This is of course wasteful, because not only are we doing duplicate work, but this incurs a lot of disk access that is inline with (blocking) the response to our client. To make matters worse, its a race for whichever thread is first, so most of this results in no action anyway - all the ’no such file’ messages we see above are happening because the first thread already beat the others to removing the files.

To improve the efficiency of all this, I set out to find a (cluster-aware) way to perform cache garbage collection asynchronously.

By ‘cluster-aware’, I mean there should only be one garbage collection process however big the imgproxy replicaset is.

After an experiment with kubernetes leases, I figured the simplest option would be just to run a dedicated deployment of 1 gc pod:

imgproxy-lite-7d5b9bf74d-45gcq                      1/1     Running   0                43m
imgproxy-lite-7d5b9bf74d-db2kr                      1/1     Running   0                43m
imgproxy-lite-gc-5d9b645bd5-64h6z                   1/1     Running   0                38s

These are deployed with argocd as such:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: imgproxy-lite
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: imgproxy-lite
  template:
    metadata:
      labels:
        app: imgproxy-lite
    spec:
      containers:
        - name: imgproxy-lite
          image: images.local:5000/imgproxy-lite
          command: ["python3", "app.py", "--serve"]
          env:
            - name: PYTHONUNBUFFERED
              value: "1"
          volumeMounts:
            - mountPath: /opt/artifacts
              name: scratch
      volumes:
        - name: scratch
          persistentVolumeClaim:
            claimName: imgproxy-lite
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: imgproxy-lite-gc
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: imgproxy-lite-gc
  template:
    metadata:
      labels:
        app: imgproxy-lite-gc
    spec:
      containers:
        - name: imgproxy-lite-gc
          image: images.local:5000/imgproxy-lite
          command: ["python3", "app.py", "--gc"]
          env:
            - name: PYTHONUNBUFFERED
              value: "1"
          volumeMounts:
            - mountPath: /opt/artifacts
              name: scratch
      volumes:
        - name: scratch
          persistentVolumeClaim:
            claimName: imgproxy-lite
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: imgproxy-lite
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: ceph-filesystem

imgproxy-lite-gc is simply the same container image running with different args. It mounts the same ceph-filesystem pv as the other pods, and its hard-coded for now to prune objects every ten minutes.

The result is this chore is removed from the path of execution of web responses, and we’re not doing duplicate work. Every once in a while we’ll see gc working in the logs:

imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 0.48064422607421875
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 0.3470175266265869
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.0.33 - - [16/Nov/2024 23:32:36] "GET /?img=illinois_caverns_2023/IMG_7632.JPG&q=33 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite converting took 0.3912653923034668
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.0.33 - - [16/Nov/2024 23:32:36] "GET /?img=illinois_caverns_2023/IMG_7657.JPG&q=33 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-45gcq imgproxy-lite 10.42.0.33 - - [16/Nov/2024 23:32:36] "GET /?img=illinois_caverns_2023/IMG_7638.JPG&q=33 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 0.3926072120666504
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.0.33 - - [16/Nov/2024 23:32:37] "GET /?img=illinois_caverns_2023/IMG_7601.JPG&q=33 HTTP/1.1" 200 -
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite converting took 0.633326530456543
imgproxy-lite-7d5b9bf74d-db2kr imgproxy-lite 10.42.0.33 - - [16/Nov/2024 23:32:37] "GET /?img=illinois_caverns_2023/IMG_7619.JPG&q=33 HTTP/1.1" 200 -
+ imgproxy-lite-gc-5d9b645bd5-64h6z › imgproxy-lite-gc
imgproxy-lite-gc-5d9b645bd5-64h6z imgproxy-lite-gc running garbage collection
imgproxy-lite-gc-5d9b645bd5-64h6z imgproxy-lite-gc artifacts/44 pruned
imgproxy-lite-gc-5d9b645bd5-64h6z imgproxy-lite-gc artifacts/43 pruned
imgproxy-lite-gc-5d9b645bd5-64h6z imgproxy-lite-gc artifacts/42 pruned
imgproxy-lite-gc-5d9b645bd5-64h6z imgproxy-lite-gc artifacts/41 pruned
imgproxy-lite-gc-5d9b645bd5-64h6z imgproxy-lite-gc artifacts/40 pruned
imgproxy-lite-gc-5d9b645bd5-64h6z imgproxy-lite-gc artifacts/39 pruned
imgproxy-lite-gc-5d9b645bd5-64h6z imgproxy-lite-gc artifacts/38 pruned

Might we run out of disk space in those ten minutes? Sure, but as shown before the service is designed to continue functioning when the disk is full; it will just hurt our hit ratio.

Nathan Hensel

on caving, mountaineering, networking, computing, electronics


2024-11-16