Richard W.M. Jones
2021-Jun-20 16:46 UTC
[Libguestfs] [PATCH libnbd 0/2] copy: Set default request-size to 2**18 (262144 bytes)
As Nir has pointed out, our current default for nbdcopy --request-size is far from optimal. In this patch series I have changed the default to something which is better, and provided some benchmark results. With this simplistic approach it's not possible to choose a default which is best in all situations. That will likely require us to benchmark many machines and try to work out a formula relating measurable aspects of those machines like L3 cache size to the best request size, but that's a lot more work. Also one of the tests implicitly depended on the default size so I had to adjust the test. Rich.
Richard W.M. Jones
2021-Jun-20 16:46 UTC
[Libguestfs] [PATCH libnbd 1/2] copy/copy-sparse-no-extents.sh: Set request-size explicitly
This test implicitly depends on the nbdcopy --request-size parameter. As I want to change the default for this parameter, set it explicitly in the test. --- copy/copy-sparse-no-extents.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/copy/copy-sparse-no-extents.sh b/copy/copy-sparse-no-extents.sh index 4dc5c88..ea1b31e 100755 --- a/copy/copy-sparse-no-extents.sh +++ b/copy/copy-sparse-no-extents.sh @@ -39,7 +39,7 @@ requires nbdkit eval --version out=copy-sparse-no-extents.out cleanup_fn rm -f $out -$VG nbdcopy --no-extents -S 0 -- \ +$VG nbdcopy --request-size=33554432 --no-extents -S 0 -- \ [ nbdkit --exit-with-parent data data=' 1 @1073741823 1 -- 2.32.0
Richard W.M. Jones
2021-Jun-20 16:46 UTC
[Libguestfs] [PATCH libnbd 2/2] copy: Set default request-size to 2**18 (262144 bytes)
As Nir has often pointed out, our current default request buffer size (32MB) is too large, resulting in nbdcopy being as much as 2? times slower than it could be. The optimum buffer size most likely depends on the hardware, and may even vary over time as machines get generally larger caches. To explore the problem I used this command: $ hyperfine -P rs 15 25 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**{rs})) \$uri \$uri"' On my 2019-era AMD server with 32GB of RAM and 64MB * 4 of L3 cache, 2**18 (262144) was the optimum when I tested all sizes between 2**15 (32K) and 2**25 (32M, the current default). Summary 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**18)) \$uri \$uri"' ran 1.03 ? 0.04 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**19)) \$uri \$uri"' 1.06 ? 0.04 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**17)) \$uri \$uri"' 1.09 ? 0.03 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**20)) \$uri \$uri"' 1.23 ? 0.04 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**21)) \$uri \$uri"' 1.26 ? 0.04 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**16)) \$uri \$uri"' 1.39 ? 0.04 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**22)) \$uri \$uri"' 1.45 ? 0.05 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**15)) \$uri \$uri"' 1.61 ? 0.05 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**23)) \$uri \$uri"' 1.94 ? 0.05 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**24)) \$uri \$uri"' 2.47 ? 0.08 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**25)) \$uri \$uri"' My 2018-era Intel laptop with a measly 8 MB of L3 cache the optimum size is one power-of-2 smaller (but 2**18 is still an improvement): Summary 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**17)) \$uri \$uri"' ran 1.05 ? 0.19 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**15)) \$uri \$uri"' 1.06 ? 0.01 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**16)) \$uri \$uri"' 1.10 ? 0.01 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**18)) \$uri \$uri"' 1.22 ? 0.01 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**19)) \$uri \$uri"' 1.29 ? 0.01 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**20)) \$uri \$uri"' 1.33 ? 0.02 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**21)) \$uri \$uri"' 1.35 ? 0.01 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**22)) \$uri \$uri"' 1.38 ? 0.01 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**23)) \$uri \$uri"' 1.45 ? 0.02 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**24)) \$uri \$uri"' 1.63 ? 0.03 times faster than 'nbdkit -U - sparse-random size=100G seed=1 --run "nbdcopy --request-size=\$((2**25)) \$uri \$uri"' To get an idea of the best request size on something rather different, this is a Raspberry Pi 4B. I had to reduce the copy size down by a factor of 10 (to 10G) to make it run in a reasonable time. 2**18 is about 8% slower than the optimum choice (2**15). It's still significantly better than our current default. Summary 'nbdkit -U - sparse-random size=10G seed=1 --run "nbdcopy --request-size=\$((2**15)) \$uri \$uri"' ran 1.00 ? 0.04 times faster than 'nbdkit -U - sparse-random size=10G seed=1 --run "nbdcopy --request-size=\$((2**21)) \$uri \$uri"' 1.03 ? 0.05 times faster than 'nbdkit -U - sparse-random size=10G seed=1 --run "nbdcopy --request-size=\$((2**20)) \$uri \$uri"' 1.04 ? 0.05 times faster than 'nbdkit -U - sparse-random size=10G seed=1 --run "nbdcopy --request-size=\$((2**22)) \$uri \$uri"' 1.05 ? 0.08 times faster than 'nbdkit -U - sparse-random size=10G seed=1 --run "nbdcopy --request-size=\$((2**16)) \$uri \$uri"' 1.05 ? 0.05 times faster than 'nbdkit -U - sparse-random size=10G seed=1 --run "nbdcopy --request-size=\$((2**19)) \$uri \$uri"' 1.07 ? 0.05 times faster than 'nbdkit -U - sparse-random size=10G seed=1 --run "nbdcopy --request-size=\$((2**17)) \$uri \$uri"' 1.08 ? 0.05 times faster than 'nbdkit -U - sparse-random size=10G seed=1 --run "nbdcopy --request-size=\$((2**18)) \$uri \$uri"' 1.15 ? 0.05 times faster than 'nbdkit -U - sparse-random size=10G seed=1 --run "nbdcopy --request-size=\$((2**23)) \$uri \$uri"' 1.28 ? 0.06 times faster than 'nbdkit -U - sparse-random size=10G seed=1 --run "nbdcopy --request-size=\$((2**24)) \$uri \$uri"' 1.35 ? 0.06 times faster than 'nbdkit -U - sparse-random size=10G seed=1 --run "nbdcopy --request-size=\$((2**25)) \$uri \$uri"' --- copy/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/copy/main.c b/copy/main.c index 0fddfc3..70534b5 100644 --- a/copy/main.c +++ b/copy/main.c @@ -50,7 +50,7 @@ bool flush; /* --flush flag */ unsigned max_requests = 64; /* --requests */ bool progress; /* -p flag */ int progress_fd = -1; /* --progress=FD */ -unsigned request_size = MAX_REQUEST_SIZE; /* --request-size */ +unsigned request_size = 1<<18; /* --request-size */ unsigned sparse_size = 4096; /* --sparse */ bool synchronous; /* --synchronous flag */ unsigned threads; /* --threads */ -- 2.32.0