Resilient static web content with Ceph RGW

This is a writeup of how to serve highly reliable static web content directly out of Ceph object storage, with kubernetes ingress for tls termination.

Motivation

RGW (or rados gateway) is the S3-compatible frontend for Ceph’s underlying internal object storage mechanism. If you’re not familiar with Ceph, it can be described as the ‘kubernetes of storage’ - it enables operators to freely upgrade and reboot servers, add/remove hardware, etc, independently of the storage abstractions it provides. These are all properties I want for web content and file hosting.

Implementation

I use rook to manage ceph. To start serving web content, we simply need to create a CephObjectStore resource and a CephObjectStoreUser to create a bucket and add content.

Here are the interesting bits of the CephObjectStore definition:

apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
  name: public-store
  namespace: rook-ceph
spec:
  metadataPool:
    failureDomain: host
    replicated:
      size: 3
      requireSafeReplicaSize: true
    parameters:
      compression_mode: aggressive
  dataPool:
    failureDomain: host
    replicated:
      size: 3
      requireSafeReplicaSize: true
    parameters:
      compression_mode: aggressive
  preservePoolsOnDelete: false
  gateway:
    port: 9001
    instances: 2
-- truncated --

This will give us a deployment of two stateless RGW pods, and a clusterIP pointing to them:

~$ kubectl get pods -nrook-ceph -o wide | grep rgw-pub
rook-ceph-rgw-public-store-a-6f7858fc46-9245k                  2/2     Running     3 (3d15h ago)    19d    10.0.200.5     x10slhf-xeon-920ea   <none>           <none>
rook-ceph-rgw-public-store-a-6f7858fc46-4rp9f                  2/2     Running     0                107s   10.0.200.2     x470d4u-zen-3700f    <none>           <none>

~$ kubectl get services -n rook-ceph | grep rgw-pub
rook-ceph-rgw-public-store   ClusterIP   10.152.183.29    <none>        9001/TCP   90d

With a user and an object store created, the Minio project’s mc client can be used to create buckets and set anonymous read-only permission:

mc mb cdn/web-assets
mc anonymous set download cdn/web-assets
~$ kubectl run -it --image=alpine alpine -- ash
/ # apk add curl ...
/ # curl -I 10.152.183.29:9001/web-assets
HTTP/1.1 200 OK
X-RGW-Object-Count: 272
X-RGW-Bytes-Used: 308492356
x-amz-request-id: tx00000e47e6e203ee9f1c2-00645fe0c3-6f262e-public-store
Content-Length: 0
Date: Sat, 13 May 2023 19:10:59 GMT
Connection: Keep-Alive

A bucket without anonymous access will look like this:

/ # curl -I 10.152.183.29:9001/bucket
HTTP/1.1 403 Forbidden
Content-Length: 235
x-amz-request-id: tx000009dfef478a2cf0395-00645fe2c2-74998e-public-store
Accept-Ranges: bytes
Content-Type: application/xml
Date: Sat, 13 May 2023 19:19:30 GMT
Connection: Keep-Alive

Let’s do a sanity check to verify bucket ‘web-assets’ is indeed readonly, again from a throwaway pod in k8s:

/ # echo 'asdf' > f.txt
/ # curl -X PUT -T f.txt 10.152.183.29:9001/web-assets/f.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   244  100   239  100     5   354k   7598 --:--:-- --:--:-- --:--:--  238k
<?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><BucketName>web-assets</BucketName><RequestId>tx00000c74d61c1356761b5-00645fe47b-74998e-public-store</RequestId><HostId>74998e-public-store-public-store</HostId></Error>/

For reference, heres a bucket with policy mc anonymous set public cdn/bucket applied:

/ # curl -X PUT -T f.txt 10.152.183.29:9001/bucket/f.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     5    0     0  100     5      0    164 --:--:-- --:--:-- --:--:--   166

/ # curl 10.152.183.29:9001/bucket/f.txt
asdf

networking

Now that we have a bucket, we need to get traffic to it.

Here I have a k8s Ingress resource, and a Service using external_name to refer to the rgw clusterIP service in rook’s namespace. I’m crossing namespaces here simply to remind myself what I have and have not created, so to stay out of rook’s way.

resource "kubernetes_service" "rook-ceph-rgw-public-store" {
  metadata {
    name = "rook-ceph-rgw-public-store"
  }
  spec {
    type                    = "ExternalName"
    external_name           = "rook-ceph-rgw-public-store.rook-ceph.svc.cluster.local"
    session_affinity        = "ClientIP"
  }
}

resource "kubernetes_ingress_v1" "rgw-public-store" {
  metadata {
    name = "rgw-public-store"
    annotations = {
      "kubernetes.io/ingress.class" = "public"
      "cert-manager.io/issuer"      = "letsencrypt-prod"
    }
  }
  spec {
    rule {
      host = "cdn.nih.earth"
      http {
        path {
          backend {
            service {
              name = "rook-ceph-rgw-public-store"
              port {
                number = 9001
              }
            }
          }
          path      = "/web-assets"
          path_type = "Prefix"
        }
      }
    }
    tls {
      hosts       = ["cdn.nih.earth"]
      secret_name = "rgw-public-store-tls"
    }
  }
}

The rgw-public-store ingress is ultimately telling my ingress-nginx deployment to use SNI and path matching to forward traffic destined to https://cdn.nih.earth/web-assets/ to the RGW deployment, and thus the web-assets bucket. Traffic is finding its way to the nginx pods via BGP anycast.

Nathan Hensel

on caving, mountaineering, networking, computing, electronics


2023-05-13