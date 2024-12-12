We measured IO performance in a VM over QCOW2 virtual disks with different settings of L2 cache sizes and cluster sizes. The performance difference can be significantly reduced if the L2 cache is not set large enough. We created five test cases, using FIO to measure IOs per second and average latency of IOs. The default L2 cache size is used in all five tests:

100 GiB virtual disk with 256 KiB cluster size 100 GiB virtual disk with 512 KiB cluster size 1 TiB virtual disk with 512 KiB cluster size 1 TiB virtual disk with 1 MiB cluster size 1 TiB virtual disk with 2 MiB cluster size

From the table shown in the previous section, we can see that for a cluster size of 256 KiB, the effectiveness of the default L2 in-memory cache (2 MiB) is 64 GiB. The virtual disk size in test case 1 is 100 GiB — much larger than 64 GiB. Therefore, the random IO read or write performance is low, caused by L2 cache thrashing. In QEMU, the L2 cache implementation is coarse grained; it loads or evicts one L2 cluster at a time. So, it is quite costly if the L2 cache starts to thrash.

If the cluster size is set to 512 KiB instead, the default L2 in-memory cache at 4 MiB (8 clusters of 512 KiB each) can reference up to 256 GiB. Therefore, the result of test case 2 is over 10x faster. Similarly, the L2 cache is too small for a 1 TiB virtual disk with 512 KiB cluster size in test case 3, while it is fine if the cluster size is set to 1 MiB or 2 MiB in test cases 4 and 5.