Prune osds and nodes from Ceph

Occasionally I’ll find myself removing decommissioned nodes from rook ceph. Heres a cheatsheet for removing node x470d4u-zen-6ad59:

$ ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                    STATUS  REWEIGHT  PRI-AFF
 -1         1.44928  root default
 -3         0.23289      host x10slhf-xeon-920ea
  0    ssd  0.23289          osd.0                    up   1.00000  1.00000
 -7         0.23289      host x10slhf-xeon-9c3ab
  2    ssd  0.23289          osd.2                    up   1.00000  1.00000
 -5         0.23289      host x470d4u-zen-3700f
  1    ssd  0.23289          osd.1                    up   1.00000  1.00000
-11         0.25020      host x470d4u-zen-43c5a
  5    ssd  0.25020          osd.5                    up   1.00000  1.00000
 -9         0.50040      host x470d4u-zen-6ad59
  3    ssd  0.25020          osd.3                  down         0  1.00000
  4    ssd  0.25020          osd.4                  down         0  1.00000
ceph osd crush remove osd.3
ceph osd crush remove osd.4
ceph osd rm 3
ceph osd rm 4
ceph auth ls | grep osd
ceph auth del osd.3
ceph auth del osd.4
ceph osd crush rm x470d4u-zen-6ad59
$ ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                    STATUS  REWEIGHT  PRI-AFF
 -1         0.94888  root default
 -3         0.23289      host x10slhf-xeon-920ea
  0    ssd  0.23289          osd.0                    up   1.00000  1.00000
 -7         0.23289      host x10slhf-xeon-9c3ab
  2    ssd  0.23289          osd.2                    up   1.00000  1.00000
 -5         0.23289      host x470d4u-zen-3700f
  1    ssd  0.23289          osd.1                    up   1.00000  1.00000
-11         0.25020      host x470d4u-zen-43c5a
  5    ssd  0.25020          osd.5                    up   1.00000  1.00000

Sometimes I’ll get a stuck osd daemon looking for the old node as well:

osd is down in failure domain "x470d4u-zen-6ad59"

The old osd pods probably still exist in a ‘pending’ state, waiting for the node. Delete the deployments, and rook will create new ones on the new node if applicable:

kubectl -n rook-ceph delete deployment rook-ceph-osd-3
kubectl -n rook-ceph delete deployment rook-ceph-osd-4

Nathan Hensel

on caving, mountaineering, networking, computing, electronics


2025-01-14