Occasionally I’ll find myself removing decommissioned nodes from rook ceph. Heres a cheatsheet for removing node x470d4u-zen-6ad59
:
$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.44928 root default
-3 0.23289 host x10slhf-xeon-920ea
0 ssd 0.23289 osd.0 up 1.00000 1.00000
-7 0.23289 host x10slhf-xeon-9c3ab
2 ssd 0.23289 osd.2 up 1.00000 1.00000
-5 0.23289 host x470d4u-zen-3700f
1 ssd 0.23289 osd.1 up 1.00000 1.00000
-11 0.25020 host x470d4u-zen-43c5a
5 ssd 0.25020 osd.5 up 1.00000 1.00000
-9 0.50040 host x470d4u-zen-6ad59
3 ssd 0.25020 osd.3 down 0 1.00000
4 ssd 0.25020 osd.4 down 0 1.00000
ceph osd crush remove osd.3
ceph osd crush remove osd.4
ceph osd rm 3
ceph osd rm 4
ceph auth ls | grep osd
ceph auth del osd.3
ceph auth del osd.4
ceph osd crush rm x470d4u-zen-6ad59
$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.94888 root default
-3 0.23289 host x10slhf-xeon-920ea
0 ssd 0.23289 osd.0 up 1.00000 1.00000
-7 0.23289 host x10slhf-xeon-9c3ab
2 ssd 0.23289 osd.2 up 1.00000 1.00000
-5 0.23289 host x470d4u-zen-3700f
1 ssd 0.23289 osd.1 up 1.00000 1.00000
-11 0.25020 host x470d4u-zen-43c5a
5 ssd 0.25020 osd.5 up 1.00000 1.00000
Sometimes I’ll get a stuck osd daemon looking for the old node as well:
osd is down in failure domain "x470d4u-zen-6ad59"
The old osd pods probably still exist in a ‘pending’ state, waiting for the node. Delete the deployments, and rook will create new ones on the new node if applicable:
kubectl -n rook-ceph delete deployment rook-ceph-osd-3
kubectl -n rook-ceph delete deployment rook-ceph-osd-4