Benchmarking Ceph performance

Ceph includes the rados bench command to do performance benchmarking on a RADOS storage cluster. The command runs a write test and two types of read tests. The --no-cleanup option is important to use when testing both read and write performance.

Before you begin

Before you begin, make sure that you have the following prerequisites in place:
  • A running IBM Storage Ceph cluster.

  • Root-level access to the node.

About this task

By default, the rados bench command will delete the objects it has written to the storage pool. Leaving behind these objects allows the two read tests to measure sequential and random read performance.

Note: Before running these performance tests, drop all the file system caches by running the following:
[ceph: root@host01 /]# echo 3 | sudo tee /proc/sys/vm/drop_caches && sudo sync

Procedure

  1. Create a new storage pool.
    For example,
    [ceph: root@host01 /]# ceph osd pool create testbench 100 100
  2. Run a write test for 10 seconds to the storage pool created in the previous step.
    For example,
    [ceph: root@host01 /]# rados bench -p testbench 10 write --no-cleanup
    
    Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects
     Object prefix: benchmark_data_cephn1.home.network_10510
       sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
         0       0         0         0         0         0         -         0
         1      16        16         0         0         0         -         0
         2      16        16         0         0         0         -         0
         3      16        16         0         0         0         -         0
         4      16        17         1  0.998879         1   3.19824   3.19824
         5      16        18         2   1.59849         4   4.56163   3.87993
         6      16        18         2   1.33222         0         -   3.87993
         7      16        19         3   1.71239         2   6.90712     4.889
         8      16        25         9   4.49551        24   7.75362   6.71216
         9      16        25         9   3.99636         0         -   6.71216
        10      16        27        11   4.39632         4   9.65085   7.18999
        11      16        27        11   3.99685         0         -   7.18999
        12      16        27        11   3.66397         0         -   7.18999
        13      16        28        12   3.68975   1.33333   12.8124   7.65853
        14      16        28        12   3.42617         0         -   7.65853
        15      16        28        12   3.19785         0         -   7.65853
        16      11        28        17   4.24726   6.66667   12.5302   9.27548
        17      11        28        17   3.99751         0         -   9.27548
        18      11        28        17   3.77546         0         -   9.27548
        19      11        28        17   3.57683         0         -   9.27548
     Total time run:         19.505620
    Total writes made:      28
    Write size:             4194304
    Bandwidth (MB/sec):     5.742
    
    Stddev Bandwidth:       5.4617
    Max bandwidth (MB/sec): 24
    Min bandwidth (MB/sec): 0
    Average Latency:        10.4064
    Stddev Latency:         3.80038
    Max latency:            19.503
    Min latency:            3.19824
  3. Run a sequential read test for 10 seconds to the storage pool.
    For example,
    [ceph: root@host01 /]# rados bench -p testbench 10 seq
    
    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
      0       0         0         0         0         0         -         0
    Total time run:        0.804869
    Total reads made:      28
    Read size:             4194304
    Bandwidth (MB/sec):    139.153
    
    Average Latency:       0.420841
    Max latency:           0.706133
    Min latency:           0.0816332
  4. Run a random read test for 10 seconds to the storage pool.
    For example,
    [ceph: root@host01 /]# rados bench -p testbench 10 rand
    
    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
      0       0         0         0         0         0         -         0
      1      16        46        30   119.801       120  0.440184  0.388125
      2      16        81        65   129.408       140  0.577359  0.417461
      3      16       120       104   138.175       156  0.597435  0.409318
      4      15       157       142   141.485       152  0.683111  0.419964
      5      16       206       190   151.553       192  0.310578  0.408343
      6      16       253       237   157.608       188 0.0745175  0.387207
      7      16       287       271   154.412       136  0.792774   0.39043
      8      16       325       309   154.044       152  0.314254   0.39876
      9      16       362       346   153.245       148  0.355576  0.406032
     10      16       405       389   155.092       172   0.64734  0.398372
    Total time run:        10.302229
    Total reads made:      405
    Read size:             4194304
    Bandwidth (MB/sec):    157.248
    
    Average Latency:       0.405976
    Max latency:           1.00869
    Min latency:           0.0378431
  5. Use additional parameters, as needed.
    • Increase the number of concurrent reads and writes, by using the -t parameter.

      The default value is 16 threads.

    • Adjust the size of the object being written, by using the -b parameter.

      The default object size is 4 MB.

      The maximum object size is 16 MB.

    • Control the names of the objects that get written during the benchmark test, by using the --run-name option, with the LABEL value.
    Tip: Run multiple copies of these benchmark tests to different pools to view the changes in performance from multiple clients.

    Run multiple rados bench commands simultaneously, by changing the --run-name label for each running command instance. Changing the label prevents potential I/O errors that can occur when multiple clients are trying to access the same object. The multiple labels also allows for different clients to access different objects.

    Use the --run-name option when trying to simulate a real world workload.

    [ceph: root@host01 /]# rados bench -p testbench 10 write -t 4 --run-name client1
    
    Maintaining 4 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects
     Object prefix: benchmark_data_node1_12631
       sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
         0       0         0         0         0         0         -         0
         1       4         4         0         0         0         -         0
         2       4         6         2   3.99099         4   1.94755   1.93361
         3       4         8         4   5.32498         8     2.978   2.44034
         4       4         8         4   3.99504         0         -   2.44034
         5       4        10         6   4.79504         4   2.92419    2.4629
         6       3        10         7   4.64471         4   3.02498    2.5432
         7       4        12         8   4.55287         4   3.12204   2.61555
         8       4        14        10    4.9821         8   2.55901   2.68396
         9       4        16        12   5.31621         8   2.68769   2.68081
        10       4        17        13   5.18488         4   2.11937   2.63763
        11       4        17        13   4.71431         0         -   2.63763
        12       4        18        14   4.65486         2    2.4836   2.62662
        13       4        18        14   4.29757         0         -   2.62662
    Total time run:         13.123548
    Total writes made:      18
    Write size:             4194304
    Bandwidth (MB/sec):     5.486
    
    Stddev Bandwidth:       3.0991
    Max bandwidth (MB/sec): 8
    Min bandwidth (MB/sec): 0
    Average Latency:        2.91578
    Stddev Latency:         0.956993
    Max latency:            5.72685
    Min latency:            1.91967
  6. Remove the data created by the rados bench command.
    For example,
    [ceph: root@host01 /]# rados -p testbench cleanup