Rook is a Cloud Native Storage solution, it creates CRDs which in turn create their corresponding storage pods and resources.

Install Rook CRD

Install Operator via helm chart. This is the foundation of all fun.

helm repo add rook-release
helm install --namespace rook-ceph rook-release/rook-ceph -n rook

Note: Rook Operator and CRD cluster must be in the same namespace, because CRD will use helm created serviceaccount to create all resources.

Ceph Block Storage

Assume we have 10 computes which all have disk sdb-sdf(hdd), and sdg is ssd. Then following yml will create a ceph cluster with 3 mons and all disks(not empty) on all computes, here we have hostNetwork: true this will work with configmap to force osds to use other dedicated storage network:

  • rook-config-override configmap forces pod to use whatever if finds in that subnet(IP inherit from host), need to kill all osd pod after change this configmap to force it refresh.
apiVersion: v1
  config: |
    public network =
    cluster network =
    public addr = ""
    cluster addr = ""
  • CephCluster CRD defination:
kind: CephCluster
  name: rook-ceph
  namespace: rook-ceph
    image: ceph/ceph:v14.2.4-20190917
  dataDirHostPath: /var/lib/rook
    count: 3
    allowMultiplePerNode: false
    enabled: true
    hostNetwork: true
  # cluster level storage configuration and selection
    useAllNodes: true
    useAllDevices: true
      metadataDevice: "sdg"

After apply, if everything’s right, you’ll see bunch of osd spawned and no errors in CephCluster CRD, wait until CephCluster instance shows created otherwise it will have weird issues.

  • 3 mons can’t ceph -s right out of box, a dedicated tool box needs to be installed for troubleshooting or mgmt purpose.
apiVersion: apps/v1
kind: Deployment
  name: rook-ceph-tools
  namespace: rook-ceph
    app: rook-ceph-tools
  replicas: 1
      app: rook-ceph-tools
        app: rook-ceph-tools
      dnsPolicy: ClusterFirstWithHostNet
      - name: rook-ceph-tools
        image: rook/ceph:master
        command: ["/tini"]
        args: ["-g", "--", "/usr/local/bin/"]
        imagePullPolicy: IfNotPresent
          - name: ROOK_ADMIN_SECRET
                name: rook-ceph-mon
                key: admin-secret
          privileged: true
          - mountPath: /dev
            name: dev
          - mountPath: /sys/bus
            name: sysbus
          - mountPath: /lib/modules
            name: libmodules
          - name: mon-endpoint-volume
            mountPath: /etc/rook
      # if hostNetwork: false, the "rbd map" command hangs, see
      hostNetwork: true
        - name: dev
            path: /dev
        - name: sysbus
            path: /sys/bus
        - name: libmodules
            path: /lib/modules
        - name: mon-endpoint-volume
            name: rook-ceph-mon-endpoints
            - key: data
              path: mon-endpoints

Delete/add new node

Let’s use k8s node upward-crow as an example.

ceph osd tree
 -9        1.17560     host upward-crow                           
 15   hdd  0.29390         osd.15             up  1.00000 1.00000 
 16   hdd  0.29390         osd.16             up  1.00000 1.00000 
 18   hdd  0.29390         osd.18             up  1.00000 1.00000 
 19   hdd  0.29390         osd.19             up  1.00000 1.00000 
 17   hdd  0.29390         osd.17             up  1.00000 1.00000


  1. Make sure remove this node won’t make ceph cluster full and refuse write new data.
ceph df
rados df
ceph osd df
  1. Disable scrubbing to make sure current disk actions won’t be affected by:
ceph osd set noscrub
ceph osd set nodeep-scrub
  1. Limit Backfill and Recovery to help maintain the performance I/O during recovering:
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1

Remove node:

  1. Remove osd on that node one by one, and wait until ceph recover itself then delete one another.
# ceph osd out <osd_id>
ceph osd out osd.15 osd.16 17 18 19 
ceph -w
  1. Remove osd from crush map so that it no longer receives data:
ceph osd crush remove osd.15
ceph osd crush remove osd.16
ceph osd crush remove osd.17
ceph osd crush remove osd.18
ceph osd crush remove osd.19
  1. Remove the OSD authentication key:
# ceph auth del osd.<osd_id>
ceph auth del osd.15
ceph auth del osd.16
ceph auth del osd.17 
ceph auth del osd.18 
ceph auth del osd.19 
  1. Remove all osd, this includes delete all osd deployments on k8s:
# ceph osd rm <osd_id>
ceph osd rm 15 16 17 18 19
  1. Remove the crush node map:
ceph osd crush rm upward-crow  

Delete/add new osd disk

This will use osd.5 as an example.
ceph commands are expected to be run in the rook-toolbox:

  1. disk fails
  2. remove disk from node
  3. mark out osd. ceph osd out osd.5
  4. remove from crush map. ceph osd crush remove osd.5
  5. delete caps. ceph auth del osd.5
  6. remove osd. ceph osd rm osd.5
  7. delete the deployment kubectl delete deployment -n rook-ceph rook-ceph-osd-id-5
  8. delete osd data dir on node rm -rf /var/lib/rook/osd5
  9. edit the osd configmap kubectl edit configmap -n rook-ceph rook-ceph-osd-nodename-config, remove config section pertaining to your osd id and underlying device.
  10. add new disk and verify node sees it.
  11. restart the rook-operator pod by deleting the rook-operator pod
  12. osd prepare pods run
  13. new rook-ceph-osd-id-5 will be created
  14. check health of your cluster ceph -s; ceph osd tree

Example to troubleshoot with Toolbox

Check running status of Ceph Cluster:

ceph -s
ceph osd tree

Create a volume image (10MB):

rbd create replicapool/test --size 10
rbd info replicapool/test

# Disable the rbd features that are not in the kernel module
rbd feature disable replicapool/test fast-diff deep-flatten object-map

Map the block volume and format it and mount it:

# Map the rbd device. If the toolbox was started with "hostNetwork: false" this hangs and you have to stop it with Ctrl-C,
# however the command still succeeds; see
rbd map replicapool/test

# Find the device name, such as rbd0
lsblk | grep rbd

# Format the volume (only do this the first time or you will lose data)
mkfs.ext4 -m0 /dev/rbd0

# Mount the block device
mkdir /tmp/rook-volume
mount /dev/rbd0 /tmp/rook-volume

Unmount the volume and unmap the kernel device:

umount /tmp/rook-volume
rbd unmap /dev/rbd0

Shared Filesystem

mount cephfs

# Create the directory
mkdir /tmp/registry

# Detect the mon endpoints and the user secret for the connection
mon_endpoints=$(grep mon_host /etc/ceph/ceph.conf | awk '{print $3}')
my_secret=$(grep key /etc/ceph/keyring | awk '{print $3}')

# Mount the file system
mount -t ceph -o mds_namespace=myfs,name=admin,secret=$my_secret $mon_endpoints:/ /tmp/registry

# See your mounted file system
df -h

Test Ceph performance


rados can test ceph cluster performance

  1. Create test pool and drop all cache data
ceph osd pool create testbench 100 100
sudo echo 3 | sudo tee /proc/sys/vm/drop_caches && sudo sync
  1. Execute a write test for 10 seconds to the newly created storage pool:
rados bench -p testbench 10 write --no-cleanup
  1. Execute a sequential read test for 10 seconds to the storage pool:
rados bench -p testbench 10 seq
  1. Execute a random read test for 10 seconds to the storage pool:
rados bench -p testbench 10 rand

Creating a Ceph Block Device

To test actual performance on a block device, use rbd bench-write.

  1. Load the rbd kernel module, if not already loaded:
sudo modprobe rbd
  1. Create a 1 GB rbd image file in the testbench pool:
sudo rbd create image01 --size 1024 --pool testbench
  1. Map the image file to a device file:
sudo rbd map image01 --pool testbench --name client.admin
  1. Create an ext4 file system on the block device:
sudo mkfs.ext4 -m0 /dev/rbd0
  1. Create a new directory:
sudo mkdir /tmp/rook-volume
  1. Mount the block device under /mnt/ceph-block-device/:
sudo mount /dev/rbd0 /tmp/rook-volume
  1. Execute the write performance test against the block device:
rbd bench-write image01 --pool=testbench