Nir Soffer
2021-Nov-14 07:21 UTC
[Libguestfs] [PATCH libnbd 0/6] Enhance synch-parallel test
I'm working on an application using the sync API. In my tests I see best read throughput with 2 nbd connections which is unexpected. The best way to investigate this is a benchmark using various combinations of request size and number of connections. I found that tests/synch-parallel already does mostly what I need. This series fixes this test and enhances it to make request size and number of connections configurable. Additional features I plan to add: - Configurable run time - for benchmarking, running 10 seconds is too fast. When running on a laptop, the first benchmark is usually faster since the CPU take time to heat and when it heats (I see 96c in my laptop) it slows down. We need to run 30 or 60 seconds for more consistent results. For CI, 10 seconds is way too slow, running for 0.1 seconds is enough to ensure the program still runs. - Configurable I/O type - (read, write, read-write). Testing manually show higher throughput for read-only test (3.9g/s) compared with 1.1g/s read and 1.1g/s write when reading and writing. Issues: - The test scripts is much slower now (120 seconds instead of 10). We need to to separate the benchmark, running many combinations (synch-parallel-bench.sh) and the test, running one combination (sync-parallel.sh). - tests/aio-parallel test seems to have the same issues and needs similar changes. Nir Soffer (6): tests/synch-parallel: Show thread run time tests/synch-parallel: Fix request loop time limit tests/synch-parallel: Remove unneeded memcpy tests/synch-parallel: Show throughput in MiB/s tests/synch-parallel: Test multiple request sizes tests/synch-parallel: Test multiple number of connections tests/synch-parallel.c | 121 +++++++++++++++++++++++++++++----------- tests/synch-parallel.sh | 17 ++++-- 2 files changed, 100 insertions(+), 38 deletions(-) -- 2.31.1
Nir Soffer
2021-Nov-14 07:21 UTC
[Libguestfs] [PATCH libnbd 1/6] tests/synch-parallel: Show thread run time
When testing throughput, we care about the time spent sending requests. Measure and log the time spent in the request loop. This remove the time to start the threads and connect to the nbd server from the results, and it ensure that all threads run the same amount of time, even if they do not start exactly at the same time. Running the tests multiple times show that threads run random time between 10 to 11 seconds, but we compute the throughput assuming run time of 10 seconds. Example runs: $ ./synch-parallel.sh thread 5: finished OK in 10.054800 seconds thread 1: finished OK in 10.054912 seconds thread 2: finished OK in 10.055090 seconds thread 0: finished OK in 10.055246 seconds thread 6: finished OK in 10.055210 seconds thread 7: finished OK in 10.055042 seconds thread 3: finished OK in 10.055282 seconds thread 4: finished OK in 10.055420 seconds TLS: disabled bytes sent: 6190678016 (619.068 Mbytes/s) bytes received: 6183370752 (618.337 Mbytes/s) I/O requests: 755252 (75525.2 IOPS) $ ./synch-parallel.sh thread 0: finished OK in 10.982875 seconds thread 6: finished OK in 10.982651 seconds thread 4: finished OK in 10.981958 seconds thread 5: finished OK in 10.982700 seconds thread 2: finished OK in 10.982774 seconds thread 1: finished OK in 10.982976 seconds thread 3: finished OK in 10.982834 seconds thread 7: finished OK in 10.982610 seconds TLS: disabled bytes sent: 6646644736 (664.664 Mbytes/s) bytes received: 6632275968 (663.228 Mbytes/s) I/O requests: 810481 (81048.1 IOPS) If we use the average thread runtime to calculate the stats, the results would be: Run 1: run time: 10.054 bytes sent: 6190678016 (615.742 Mbytes/s) bytes received: 6183370752 (615.015 Mbytes/s) I/O requests: 755252 (75119.5 IOPS) Run 2: run time: 10.982 bytes sent: 6646644736 (605.230 Mbytes/s) bytes received: 6632275968 (603.922 Mbytes/s) I/O requests: 810481 (73800.8 IOPS) Run 2 results are off by ~10%. Signed-off-by: Nir Soffer <nsoffer at redhat.com> --- tests/synch-parallel.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/tests/synch-parallel.c b/tests/synch-parallel.c index d5efa3f..686e406 100644 --- a/tests/synch-parallel.c +++ b/tests/synch-parallel.c @@ -16,73 +16,85 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ /* Test synchronous parallel high level API requests on different * handles. There should be no shared state between the handles so * this should run at full speed (albeit with us only having a single * command per thread in flight). */ #include <config.h> #include <stdio.h> #include <stdlib.h> #include <stdbool.h> #include <stdint.h> #include <inttypes.h> #include <string.h> #include <unistd.h> #include <errno.h> #include <assert.h> +#include <sys/time.h> #include <pthread.h> #include <libnbd.h> #include "byte-swapping.h" /* We keep a shadow of the RAM disk so we can check integrity of the data. */ static char *ramdisk; /* This is also defined in synch-parallel.sh and checked here. */ #define EXPORTSIZE (8*1024*1024) /* How long (seconds) that the test will run for. */ #define RUN_TIME 10 /* Number of threads. */ #define NR_THREADS 8 +#define MICROSECONDS 1000000 + /* Unix socket. */ static const char *unixsocket; struct thread_status { size_t i; /* Thread index, 0 .. NR_THREADS-1 */ time_t end_time; /* Threads run until this end time. */ uint64_t offset, length; /* Area assigned to this thread. */ int status; /* Return status. */ unsigned requests; /* Total number of requests made. */ uint64_t bytes_sent, bytes_received; /* Bytes sent and received by thread. */ }; static void *start_thread (void *arg); +static inline int64_t +microtime (void) +{ + struct timeval tv; + + gettimeofday(&tv, NULL); + return tv.tv_sec * MICROSECONDS + tv.tv_usec; +} + int main (int argc, char *argv[]) { pthread_t threads[NR_THREADS]; struct thread_status status[NR_THREADS]; size_t i; time_t t; int err; unsigned requests, errors; uint64_t bytes_sent, bytes_received; if (argc != 2) { fprintf (stderr, "%s socket\n", argv[0]); exit (EXIT_FAILURE); } unixsocket = argv[1]; /* Get the current time and the end time. */ time (&t); t += RUN_TIME; @@ -156,40 +168,41 @@ main (int argc, char *argv[]) printf ("bytes received: %" PRIu64 " (%g Mbytes/s)\n", bytes_received, (double) bytes_received / RUN_TIME / 1000000); printf ("I/O requests: %u (%g IOPS)\n", requests, (double) requests / RUN_TIME); exit (errors == 0 ? EXIT_SUCCESS : EXIT_FAILURE); } #define BUFFER_SIZE 16384 static void * start_thread (void *arg) { struct thread_status *status = arg; struct nbd_handle *nbd; char *buf; int cmd; uint64_t offset; time_t t; + int64_t start_usec, run_usec; buf = calloc (BUFFER_SIZE, 1); if (buf == NULL) { perror ("calloc"); exit (EXIT_FAILURE); } nbd = nbd_create (); if (nbd == NULL) { fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); } #ifdef TLS /* Require TLS on the handle and fail if not available or if the * handshake fails. */ if (nbd_set_tls (nbd, LIBNBD_TLS_REQUIRE) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); @@ -199,73 +212,78 @@ start_thread (void *arg) fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); } if (nbd_set_tls_psk_file (nbd, "keys.psk") == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); } #endif /* Connect to nbdkit. */ if (nbd_connect_unix (nbd, unixsocket) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); } assert (nbd_get_size (nbd) == EXPORTSIZE); assert (nbd_can_multi_conn (nbd) > 0); assert (nbd_is_read_only (nbd) == 0); + start_usec = microtime (); + /* Issue commands. */ while (1) { /* Run until the timer expires. */ time (&t); if (t > status->end_time) break; /* Issue a synchronous read or write command. */ offset = status->offset + (rand () % (status->length - BUFFER_SIZE)); cmd = rand () & 1; if (cmd == 0) { if (nbd_pwrite (nbd, buf, BUFFER_SIZE, offset, 0) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); goto error; } status->bytes_sent += BUFFER_SIZE; memcpy (&ramdisk[offset], buf, BUFFER_SIZE); } else { if (nbd_pread (nbd, buf, BUFFER_SIZE, offset, 0) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); goto error; } status->bytes_received += BUFFER_SIZE; if (memcmp (&ramdisk[offset], buf, BUFFER_SIZE) != 0) { fprintf (stderr, "thread %zu: DATA INTEGRITY ERROR!\n", status->i); goto error; } } status->requests++; } - printf ("thread %zu: finished OK\n", status->i); + run_usec = microtime () - start_usec; + + printf ("thread %zu: finished OK in %.6f seconds\n", + status->i, (double) run_usec / MICROSECONDS); if (nbd_shutdown (nbd, 0) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); goto error; } nbd_close (nbd); free (buf); status->status = 0; pthread_exit (status); error: free (buf); fprintf (stderr, "thread %zu: failed\n", status->i); status->status = -1; pthread_exit (status); } -- 2.31.1
Nir Soffer
2021-Nov-14 07:21 UTC
[Libguestfs] [PATCH libnbd 2/6] tests/synch-parallel: Fix request loop time limit
Use microtime() for stopping the request loop. With this all threads run for 10.000400 (+-0.000400), so using 10.0 seconds for computing the stats is correct. Because the run time is practically the same in all runs, we get more stable results for multiple runs. Here are few example runs: $ ./synch-parallel.sh thread 4: finished OK in 10.000023 seconds thread 1: finished OK in 10.000090 seconds thread 0: finished OK in 10.000170 seconds thread 2: finished OK in 10.000159 seconds thread 5: finished OK in 10.000124 seconds thread 3: finished OK in 10.000546 seconds thread 6: finished OK in 10.000705 seconds thread 7: finished OK in 10.000203 seconds TLS: disabled bytes sent: 6558744576 (655.874 Mbytes/s) bytes received: 6567395328 (656.74 Mbytes/s) I/O requests: 801156 (80115.6 IOPS) $ ./synch-parallel.sh thread 1: finished OK in 10.000013 seconds thread 4: finished OK in 10.000083 seconds thread 2: finished OK in 10.000007 seconds thread 0: finished OK in 10.000046 seconds thread 3: finished OK in 10.000080 seconds thread 6: finished OK in 10.000013 seconds thread 7: finished OK in 10.000052 seconds thread 5: finished OK in 10.000266 seconds TLS: disabled bytes sent: 6436765696 (643.677 Mbytes/s) bytes received: 6409781248 (640.978 Mbytes/s) I/O requests: 784091 (78409.1 IOPS) $ ./synch-parallel.sh thread 1: finished OK in 10.000037 seconds thread 0: finished OK in 10.000030 seconds thread 7: finished OK in 10.000017 seconds thread 2: finished OK in 10.000066 seconds thread 5: finished OK in 10.000168 seconds thread 4: finished OK in 10.000037 seconds thread 6: finished OK in 10.000144 seconds thread 3: finished OK in 10.000806 seconds TLS: disabled bytes sent: 6416236544 (641.624 Mbytes/s) bytes received: 6434586624 (643.459 Mbytes/s) I/O requests: 784352 (78435.2 IOPS) Signed-off-by: Nir Soffer <nsoffer at redhat.com> --- tests/synch-parallel.c | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/tests/synch-parallel.c b/tests/synch-parallel.c index 686e406..72402d7 100644 --- a/tests/synch-parallel.c +++ b/tests/synch-parallel.c @@ -43,98 +43,91 @@ /* We keep a shadow of the RAM disk so we can check integrity of the data. */ static char *ramdisk; /* This is also defined in synch-parallel.sh and checked here. */ #define EXPORTSIZE (8*1024*1024) /* How long (seconds) that the test will run for. */ #define RUN_TIME 10 /* Number of threads. */ #define NR_THREADS 8 #define MICROSECONDS 1000000 /* Unix socket. */ static const char *unixsocket; struct thread_status { size_t i; /* Thread index, 0 .. NR_THREADS-1 */ - time_t end_time; /* Threads run until this end time. */ uint64_t offset, length; /* Area assigned to this thread. */ int status; /* Return status. */ unsigned requests; /* Total number of requests made. */ uint64_t bytes_sent, bytes_received; /* Bytes sent and received by thread. */ }; static void *start_thread (void *arg); static inline int64_t microtime (void) { struct timeval tv; gettimeofday(&tv, NULL); return tv.tv_sec * MICROSECONDS + tv.tv_usec; } int main (int argc, char *argv[]) { pthread_t threads[NR_THREADS]; struct thread_status status[NR_THREADS]; size_t i; - time_t t; int err; unsigned requests, errors; uint64_t bytes_sent, bytes_received; if (argc != 2) { fprintf (stderr, "%s socket\n", argv[0]); exit (EXIT_FAILURE); } unixsocket = argv[1]; - /* Get the current time and the end time. */ - time (&t); - t += RUN_TIME; - - srand (t + getpid ()); + srand ((microtime () / MICROSECONDS) + getpid ()); /* Initialize the RAM disk with the initial data from * nbdkit-pattern-filter. */ ramdisk = malloc (EXPORTSIZE); if (ramdisk == NULL) { perror ("calloc"); exit (EXIT_FAILURE); } for (i = 0; i < EXPORTSIZE; i += 8) { uint64_t d = htobe64 (i); memcpy (&ramdisk[i], &d, sizeof d); } /* Start the worker threads. */ for (i = 0; i < NR_THREADS; ++i) { status[i].i = i; - status[i].end_time = t; status[i].offset = i * EXPORTSIZE / NR_THREADS; status[i].length = EXPORTSIZE / NR_THREADS; status[i].status = 0; status[i].requests = 0; status[i].bytes_sent = status[i].bytes_received = 0; err = pthread_create (&threads[i], NULL, start_thread, &status[i]); if (err != 0) { errno = err; perror ("pthread_create"); exit (EXIT_FAILURE); } } /* Wait for the threads to exit. */ errors = 0; requests = 0; bytes_sent = bytes_received = 0; for (i = 0; i < NR_THREADS; ++i) { err = pthread_join (threads[i], NULL); if (err != 0) { @@ -167,42 +160,41 @@ main (int argc, char *argv[]) bytes_sent, (double) bytes_sent / RUN_TIME / 1000000); printf ("bytes received: %" PRIu64 " (%g Mbytes/s)\n", bytes_received, (double) bytes_received / RUN_TIME / 1000000); printf ("I/O requests: %u (%g IOPS)\n", requests, (double) requests / RUN_TIME); exit (errors == 0 ? EXIT_SUCCESS : EXIT_FAILURE); } #define BUFFER_SIZE 16384 static void * start_thread (void *arg) { struct thread_status *status = arg; struct nbd_handle *nbd; char *buf; int cmd; uint64_t offset; - time_t t; - int64_t start_usec, run_usec; + int64_t start_usec, stop_usec, now_usec; buf = calloc (BUFFER_SIZE, 1); if (buf == NULL) { perror ("calloc"); exit (EXIT_FAILURE); } nbd = nbd_create (); if (nbd == NULL) { fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); } #ifdef TLS /* Require TLS on the handle and fail if not available or if the * handshake fails. */ if (nbd_set_tls (nbd, LIBNBD_TLS_REQUIRE) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); @@ -213,77 +205,76 @@ start_thread (void *arg) exit (EXIT_FAILURE); } if (nbd_set_tls_psk_file (nbd, "keys.psk") == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); } #endif /* Connect to nbdkit. */ if (nbd_connect_unix (nbd, unixsocket) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); } assert (nbd_get_size (nbd) == EXPORTSIZE); assert (nbd_can_multi_conn (nbd) > 0); assert (nbd_is_read_only (nbd) == 0); start_usec = microtime (); + stop_usec = start_usec + RUN_TIME * MICROSECONDS; /* Issue commands. */ while (1) { /* Run until the timer expires. */ - time (&t); - if (t > status->end_time) + now_usec = microtime (); + if (now_usec >= stop_usec) break; /* Issue a synchronous read or write command. */ offset = status->offset + (rand () % (status->length - BUFFER_SIZE)); cmd = rand () & 1; if (cmd == 0) { if (nbd_pwrite (nbd, buf, BUFFER_SIZE, offset, 0) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); goto error; } status->bytes_sent += BUFFER_SIZE; memcpy (&ramdisk[offset], buf, BUFFER_SIZE); } else { if (nbd_pread (nbd, buf, BUFFER_SIZE, offset, 0) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); goto error; } status->bytes_received += BUFFER_SIZE; if (memcmp (&ramdisk[offset], buf, BUFFER_SIZE) != 0) { fprintf (stderr, "thread %zu: DATA INTEGRITY ERROR!\n", status->i); goto error; } } status->requests++; } - run_usec = microtime () - start_usec; - printf ("thread %zu: finished OK in %.6f seconds\n", - status->i, (double) run_usec / MICROSECONDS); + status->i, (double) (now_usec - start_usec) / MICROSECONDS); if (nbd_shutdown (nbd, 0) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); goto error; } nbd_close (nbd); free (buf); status->status = 0; pthread_exit (status); error: free (buf); fprintf (stderr, "thread %zu: failed\n", status->i); status->status = -1; pthread_exit (status); } -- 2.31.1
Nir Soffer
2021-Nov-14 07:21 UTC
[Libguestfs] [PATCH libnbd 3/6] tests/synch-parallel: Remove unneeded memcpy
The synch-parallel test was writing the thread buffer to the nbd server, and then copying data from the ramdisk to the thread buffer. This looks wrong for two reasons: - The thread buffer is initialized to zeroes, and after every write, to the contents of the ramdisk, so we always write data which does not match the contents of the ramdisk. I guess this works since the patten filter drops the written data. - For every write, we pay for unneeded memcpy() adding unwanted noise to the results. Simplify the code to either copy a block from the ramdisk to the nbd server or copy a block from nbd server to the thread buffer. Testing show no significant difference with or without the unneeded memcpy(). Signed-off-by: Nir Soffer <nsoffer at redhat.com> --- tests/synch-parallel.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tests/synch-parallel.c b/tests/synch-parallel.c index 72402d7..cecfeae 100644 --- a/tests/synch-parallel.c +++ b/tests/synch-parallel.c @@ -218,48 +218,49 @@ start_thread (void *arg) } assert (nbd_get_size (nbd) == EXPORTSIZE); assert (nbd_can_multi_conn (nbd) > 0); assert (nbd_is_read_only (nbd) == 0); start_usec = microtime (); stop_usec = start_usec + RUN_TIME * MICROSECONDS; /* Issue commands. */ while (1) { /* Run until the timer expires. */ now_usec = microtime (); if (now_usec >= stop_usec) break; /* Issue a synchronous read or write command. */ offset = status->offset + (rand () % (status->length - BUFFER_SIZE)); cmd = rand () & 1; if (cmd == 0) { - if (nbd_pwrite (nbd, buf, BUFFER_SIZE, offset, 0) == -1) { + /* Write block from ramdisk to nbd server. */ + if (nbd_pwrite (nbd, &ramdisk[offset], BUFFER_SIZE, offset, 0) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); goto error; } status->bytes_sent += BUFFER_SIZE; - memcpy (&ramdisk[offset], buf, BUFFER_SIZE); } else { + /* Read block from nbd server to buf. */ if (nbd_pread (nbd, buf, BUFFER_SIZE, offset, 0) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); goto error; } status->bytes_received += BUFFER_SIZE; if (memcmp (&ramdisk[offset], buf, BUFFER_SIZE) != 0) { fprintf (stderr, "thread %zu: DATA INTEGRITY ERROR!\n", status->i); goto error; } } status->requests++; } printf ("thread %zu: finished OK in %.6f seconds\n", status->i, (double) (now_usec - start_usec) / MICROSECONDS); if (nbd_shutdown (nbd, 0) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); goto error; } -- 2.31.1
Nir Soffer
2021-Nov-14 07:21 UTC
[Libguestfs] [PATCH libnbd 4/6] tests/synch-parallel: Show throughput in MiB/s
Using Mbytes/s is less useful, people expect values in MiB. Example output: $ ./synch-parallel.sh ... bytes sent: 12686721024 (1209.9 MiB/s) bytes received: 12734693376 (1214.47 MiB/s) I/O requests: 96975 (9697.5 IOPS) Signed-off-by: Nir Soffer <nsoffer at redhat.com> --- tests/synch-parallel.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/tests/synch-parallel.c b/tests/synch-parallel.c index cecfeae..423e1f0 100644 --- a/tests/synch-parallel.c +++ b/tests/synch-parallel.c @@ -36,40 +36,42 @@ #include <sys/time.h> #include <pthread.h> #include <libnbd.h> #include "byte-swapping.h" /* We keep a shadow of the RAM disk so we can check integrity of the data. */ static char *ramdisk; /* This is also defined in synch-parallel.sh and checked here. */ #define EXPORTSIZE (8*1024*1024) /* How long (seconds) that the test will run for. */ #define RUN_TIME 10 /* Number of threads. */ #define NR_THREADS 8 +#define MiB (1024*1024) + #define MICROSECONDS 1000000 /* Unix socket. */ static const char *unixsocket; struct thread_status { size_t i; /* Thread index, 0 .. NR_THREADS-1 */ uint64_t offset, length; /* Area assigned to this thread. */ int status; /* Return status. */ unsigned requests; /* Total number of requests made. */ uint64_t bytes_sent, bytes_received; /* Bytes sent and received by thread. */ }; static void *start_thread (void *arg); static inline int64_t microtime (void) { struct timeval tv; @@ -139,44 +141,44 @@ main (int argc, char *argv[]) fprintf (stderr, "thread %zu failed with status %d\n", i, status[i].status); errors++; } requests += status[i].requests; bytes_sent += status[i].bytes_sent; bytes_received += status[i].bytes_received; } free (ramdisk); /* Print some stats. */ printf ("TLS: %s\n", #ifdef TLS "enabled" #else "disabled" #endif ); - printf ("bytes sent: %" PRIu64 " (%g Mbytes/s)\n", - bytes_sent, (double) bytes_sent / RUN_TIME / 1000000); - printf ("bytes received: %" PRIu64 " (%g Mbytes/s)\n", - bytes_received, (double) bytes_received / RUN_TIME / 1000000); + printf ("bytes sent: %" PRIu64 " (%g MiB/s)\n", + bytes_sent, (double) bytes_sent / RUN_TIME / MiB); + printf ("bytes received: %" PRIu64 " (%g MiB/s)\n", + bytes_received, (double) bytes_received / RUN_TIME / MiB); printf ("I/O requests: %u (%g IOPS)\n", requests, (double) requests / RUN_TIME); exit (errors == 0 ? EXIT_SUCCESS : EXIT_FAILURE); } #define BUFFER_SIZE 16384 static void * start_thread (void *arg) { struct thread_status *status = arg; struct nbd_handle *nbd; char *buf; int cmd; uint64_t offset; int64_t start_usec, stop_usec, now_usec; buf = calloc (BUFFER_SIZE, 1); -- 2.31.1
Nir Soffer
2021-Nov-14 07:21 UTC
[Libguestfs] [PATCH libnbd 5/6] tests/synch-parallel: Test multiple request sizes
Rename BUFFER_SIZE to REQUEST_SIZE and get the value from an environment variable. Change the test script to test multiple values. An example run: $ ./synch-parallel.sh Request size: 4096 thread 0: finished OK in 10.000019 seconds thread 4: finished OK in 10.000003 seconds thread 2: finished OK in 10.000015 seconds thread 1: finished OK in 10.000053 seconds thread 6: finished OK in 10.000075 seconds thread 7: finished OK in 10.000012 seconds thread 3: finished OK in 10.000024 seconds thread 5: finished OK in 10.000000 seconds TLS: disabled bytes sent: 2412867584 (230.109 MiB/s) bytes received: 2419851264 (230.775 MiB/s) I/O requests: 1179863 (117986 IOPS) Request size: 262144 thread 2: finished OK in 10.000030 seconds thread 5: finished OK in 10.000415 seconds thread 1: finished OK in 10.000500 seconds thread 3: finished OK in 10.000459 seconds thread 6: finished OK in 10.000275 seconds thread 4: finished OK in 10.000520 seconds thread 7: finished OK in 10.000529 seconds thread 0: finished OK in 10.000768 seconds TLS: disabled bytes sent: 12791840768 (1219.92 MiB/s) bytes received: 12748324864 (1215.78 MiB/s) I/O requests: 97428 (9742.8 IOPS) 4k is a good size to test IOPS, and 256k is a good size for getting maximum throughput. I kept the current default 16k as is. Signed-off-by: Nir Soffer <nsoffer at redhat.com> --- tests/synch-parallel.c | 51 +++++++++++++++++++++++++++++++++-------- tests/synch-parallel.sh | 12 ++++++---- 2 files changed, 49 insertions(+), 14 deletions(-) diff --git a/tests/synch-parallel.c b/tests/synch-parallel.c index 423e1f0..d6ab1df 100644 --- a/tests/synch-parallel.c +++ b/tests/synch-parallel.c @@ -36,82 +36,115 @@ #include <sys/time.h> #include <pthread.h> #include <libnbd.h> #include "byte-swapping.h" /* We keep a shadow of the RAM disk so we can check integrity of the data. */ static char *ramdisk; /* This is also defined in synch-parallel.sh and checked here. */ #define EXPORTSIZE (8*1024*1024) /* How long (seconds) that the test will run for. */ #define RUN_TIME 10 /* Number of threads. */ #define NR_THREADS 8 -#define MiB (1024*1024) +#define KiB 1024 +#define MiB (1024*KiB) #define MICROSECONDS 1000000 /* Unix socket. */ static const char *unixsocket; +static long request_size; + struct thread_status { size_t i; /* Thread index, 0 .. NR_THREADS-1 */ uint64_t offset, length; /* Area assigned to this thread. */ int status; /* Return status. */ unsigned requests; /* Total number of requests made. */ uint64_t bytes_sent, bytes_received; /* Bytes sent and received by thread. */ }; static void *start_thread (void *arg); static inline int64_t microtime (void) { struct timeval tv; gettimeofday(&tv, NULL); return tv.tv_sec * MICROSECONDS + tv.tv_usec; } +static long +getenv_long(const char *name, long defval) +{ + const char *value; + char *end; + long res; + + value = getenv (name); + if (value == NULL) + return defval; + + res = strtol(value, &end, 10); + if (*end != '\0' || end == value) { + fprintf (stderr, "Invalid value for %s: '%s'\n", name, value); + exit (EXIT_FAILURE); + } + + return res; +} + int main (int argc, char *argv[]) { pthread_t threads[NR_THREADS]; struct thread_status status[NR_THREADS]; size_t i; int err; unsigned requests, errors; uint64_t bytes_sent, bytes_received; if (argc != 2) { fprintf (stderr, "%s socket\n", argv[0]); exit (EXIT_FAILURE); } unixsocket = argv[1]; + request_size = getenv_long ("REQUEST_SIZE", 16*KiB); + if (request_size < 4*KiB || + request_size > 512*KiB || + (EXPORTSIZE / NR_THREADS) % request_size != 0) { + fprintf (stderr, + "Invalid REQUEST_SIZE environment variable: %ld\n", + request_size); + exit (EXIT_FAILURE); + } + srand ((microtime () / MICROSECONDS) + getpid ()); /* Initialize the RAM disk with the initial data from * nbdkit-pattern-filter. */ ramdisk = malloc (EXPORTSIZE); if (ramdisk == NULL) { perror ("calloc"); exit (EXIT_FAILURE); } for (i = 0; i < EXPORTSIZE; i += 8) { uint64_t d = htobe64 (i); memcpy (&ramdisk[i], &d, sizeof d); } /* Start the worker threads. */ for (i = 0; i < NR_THREADS; ++i) { status[i].i = i; status[i].offset = i * EXPORTSIZE / NR_THREADS; status[i].length = EXPORTSIZE / NR_THREADS; @@ -152,53 +185,51 @@ main (int argc, char *argv[]) /* Print some stats. */ printf ("TLS: %s\n", #ifdef TLS "enabled" #else "disabled" #endif ); printf ("bytes sent: %" PRIu64 " (%g MiB/s)\n", bytes_sent, (double) bytes_sent / RUN_TIME / MiB); printf ("bytes received: %" PRIu64 " (%g MiB/s)\n", bytes_received, (double) bytes_received / RUN_TIME / MiB); printf ("I/O requests: %u (%g IOPS)\n", requests, (double) requests / RUN_TIME); exit (errors == 0 ? EXIT_SUCCESS : EXIT_FAILURE); } -#define BUFFER_SIZE 16384 - static void * start_thread (void *arg) { struct thread_status *status = arg; struct nbd_handle *nbd; char *buf; int cmd; uint64_t offset; int64_t start_usec, stop_usec, now_usec; - buf = calloc (BUFFER_SIZE, 1); + buf = calloc (request_size, 1); if (buf == NULL) { perror ("calloc"); exit (EXIT_FAILURE); } nbd = nbd_create (); if (nbd == NULL) { fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); } #ifdef TLS /* Require TLS on the handle and fail if not available or if the * handshake fails. */ if (nbd_set_tls (nbd, LIBNBD_TLS_REQUIRE) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); } @@ -217,58 +248,58 @@ start_thread (void *arg) if (nbd_connect_unix (nbd, unixsocket) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); exit (EXIT_FAILURE); } assert (nbd_get_size (nbd) == EXPORTSIZE); assert (nbd_can_multi_conn (nbd) > 0); assert (nbd_is_read_only (nbd) == 0); start_usec = microtime (); stop_usec = start_usec + RUN_TIME * MICROSECONDS; /* Issue commands. */ while (1) { /* Run until the timer expires. */ now_usec = microtime (); if (now_usec >= stop_usec) break; /* Issue a synchronous read or write command. */ - offset = status->offset + (rand () % (status->length - BUFFER_SIZE)); + offset = status->offset + (rand () % (status->length - request_size)); cmd = rand () & 1; if (cmd == 0) { /* Write block from ramdisk to nbd server. */ - if (nbd_pwrite (nbd, &ramdisk[offset], BUFFER_SIZE, offset, 0) == -1) { + if (nbd_pwrite (nbd, &ramdisk[offset], request_size, offset, 0) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); goto error; } - status->bytes_sent += BUFFER_SIZE; + status->bytes_sent += request_size; } else { /* Read block from nbd server to buf. */ - if (nbd_pread (nbd, buf, BUFFER_SIZE, offset, 0) == -1) { + if (nbd_pread (nbd, buf, request_size, offset, 0) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); goto error; } - status->bytes_received += BUFFER_SIZE; - if (memcmp (&ramdisk[offset], buf, BUFFER_SIZE) != 0) { + status->bytes_received += request_size; + if (memcmp (&ramdisk[offset], buf, request_size) != 0) { fprintf (stderr, "thread %zu: DATA INTEGRITY ERROR!\n", status->i); goto error; } } status->requests++; } printf ("thread %zu: finished OK in %.6f seconds\n", status->i, (double) (now_usec - start_usec) / MICROSECONDS); if (nbd_shutdown (nbd, 0) == -1) { fprintf (stderr, "%s\n", nbd_get_error ()); goto error; } nbd_close (nbd); free (buf); status->status = 0; diff --git a/tests/synch-parallel.sh b/tests/synch-parallel.sh index 84c00d8..0ca9060 100755 --- a/tests/synch-parallel.sh +++ b/tests/synch-parallel.sh @@ -1,24 +1,28 @@ #!/usr/bin/env bash # nbd client library in userspace # Copyright (C) 2019 Red Hat Inc. # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA # Test synchronous parallel high level API requests. -nbdkit -U - \ - --filter=cow \ - pattern size=8M \ - --run '$VG ./synch-parallel $unixsocket' +for request_size in 4096 262144; do + echo "Request size: $request_size" + REQUEST_SIZE=$request_size nbdkit -U - \ + --filter=cow \ + pattern size=8M \ + --run '$VG ./synch-parallel $unixsocket' + echo +done -- 2.31.1
Nir Soffer
2021-Nov-14 07:21 UTC
[Libguestfs] [PATCH libnbd 6/6] tests/synch-parallel: Test multiple number of connections
Rename NR_THREADS to CONNECTIONS and get the value form an environment variable. Change the test script to test multiple connections. An example run: $ ./synch-parallel.sh Connections: 1 Request size: 4096 thread 0: finished OK in 10.000000 seconds TLS: disabled bytes sent: 996638720 (95.0469 MiB/s) bytes received: 995004416 (94.891 MiB/s) I/O requests: 486241 (48624.1 IOPS) Connections: 1 Request size: 262144 thread 0: finished OK in 10.000047 seconds TLS: disabled bytes sent: 9216720896 (878.975 MiB/s) bytes received: 9266528256 (883.725 MiB/s) I/O requests: 70508 (7050.8 IOPS) Connections: 2 Request size: 4096 thread 0: finished OK in 10.000015 seconds thread 1: finished OK in 10.000012 seconds TLS: disabled bytes sent: 1681920000 (160.4 MiB/s) bytes received: 1680896000 (160.303 MiB/s) I/O requests: 821000 (82100 IOPS) Connections: 2 Request size: 262144 thread 0: finished OK in 10.000048 seconds thread 1: finished OK in 10.000060 seconds TLS: disabled bytes sent: 12331515904 (1176.03 MiB/s) bytes received: 12310282240 (1174 MiB/s) I/O requests: 94001 (9400.1 IOPS) Connections: 4 Request size: 4096 thread 3: finished OK in 10.000004 seconds thread 0: finished OK in 10.000029 seconds thread 2: finished OK in 10.000011 seconds thread 1: finished OK in 10.000025 seconds TLS: disabled bytes sent: 2024407040 (193.062 MiB/s) bytes received: 2024652800 (193.086 MiB/s) I/O requests: 988540 (98854 IOPS) Connections: 4 Request size: 262144 thread 2: finished OK in 10.000300 seconds thread 1: finished OK in 10.000360 seconds thread 0: finished OK in 10.000340 seconds thread 3: finished OK in 10.000325 seconds TLS: disabled bytes sent: 12098994176 (1153.85 MiB/s) bytes received: 12016680960 (1146 MiB/s) I/O requests: 91994 (9199.4 IOPS) Connections: 8 Request size: 4096 thread 0: finished OK in 10.000002 seconds thread 1: finished OK in 10.000050 seconds thread 5: finished OK in 10.000098 seconds thread 3: finished OK in 10.000013 seconds thread 6: finished OK in 10.000239 seconds thread 4: finished OK in 10.000126 seconds thread 2: finished OK in 10.000068 seconds thread 7: finished OK in 10.000215 seconds TLS: disabled bytes sent: 2110287872 (201.253 MiB/s) bytes received: 2105614336 (200.807 MiB/s) I/O requests: 1029273 (102927 IOPS) Connections: 8 Request size: 262144 thread 7: finished OK in 10.000441 seconds thread 6: finished OK in 10.000351 seconds thread 2: finished OK in 10.000572 seconds thread 0: finished OK in 10.000676 seconds thread 3: finished OK in 10.000772 seconds thread 5: finished OK in 10.000839 seconds thread 4: finished OK in 10.000968 seconds thread 1: finished OK in 10.000861 seconds TLS: disabled bytes sent: 11867783168 (1131.8 MiB/s) bytes received: 11875647488 (1132.55 MiB/s) I/O requests: 90574 (9057.4 IOPS) Signed-off-by: Nir Soffer <nsoffer at redhat.com> --- tests/synch-parallel.c | 30 ++++++++++++++++++++---------- tests/synch-parallel.sh | 19 ++++++++++++------- 2 files changed, 32 insertions(+), 17 deletions(-) diff --git a/tests/synch-parallel.c b/tests/synch-parallel.c index d6ab1df..099a906 100644 --- a/tests/synch-parallel.c +++ b/tests/synch-parallel.c @@ -33,154 +33,164 @@ #include <unistd.h> #include <errno.h> #include <assert.h> #include <sys/time.h> #include <pthread.h> #include <libnbd.h> #include "byte-swapping.h" /* We keep a shadow of the RAM disk so we can check integrity of the data. */ static char *ramdisk; /* This is also defined in synch-parallel.sh and checked here. */ #define EXPORTSIZE (8*1024*1024) /* How long (seconds) that the test will run for. */ #define RUN_TIME 10 -/* Number of threads. */ -#define NR_THREADS 8 +#define MAX_CONNECTIONS 8 #define KiB 1024 #define MiB (1024*KiB) #define MICROSECONDS 1000000 /* Unix socket. */ static const char *unixsocket; static long request_size; +static long connections; struct thread_status { - size_t i; /* Thread index, 0 .. NR_THREADS-1 */ + size_t i; /* Thread index, 0 .. connections-1 */ uint64_t offset, length; /* Area assigned to this thread. */ int status; /* Return status. */ unsigned requests; /* Total number of requests made. */ uint64_t bytes_sent, bytes_received; /* Bytes sent and received by thread. */ }; static void *start_thread (void *arg); static inline int64_t microtime (void) { struct timeval tv; gettimeofday(&tv, NULL); return tv.tv_sec * MICROSECONDS + tv.tv_usec; } static long getenv_long(const char *name, long defval) { const char *value; char *end; long res; value = getenv (name); if (value == NULL) return defval; res = strtol(value, &end, 10); if (*end != '\0' || end == value) { fprintf (stderr, "Invalid value for %s: '%s'\n", name, value); exit (EXIT_FAILURE); } return res; } int main (int argc, char *argv[]) { - pthread_t threads[NR_THREADS]; - struct thread_status status[NR_THREADS]; + pthread_t threads[MAX_CONNECTIONS]; + struct thread_status status[MAX_CONNECTIONS]; size_t i; int err; unsigned requests, errors; uint64_t bytes_sent, bytes_received; if (argc != 2) { fprintf (stderr, "%s socket\n", argv[0]); exit (EXIT_FAILURE); } unixsocket = argv[1]; + connections = getenv_long ("CONNECTIONS", 8); + if (connections < 1 || + connections > MAX_CONNECTIONS || + EXPORTSIZE % connections != 0) { + fprintf (stderr, + "Invalid CONNECTIONS environment variable: %ld\n", + connections); + exit (EXIT_FAILURE); + } + request_size = getenv_long ("REQUEST_SIZE", 16*KiB); if (request_size < 4*KiB || request_size > 512*KiB || - (EXPORTSIZE / NR_THREADS) % request_size != 0) { + (EXPORTSIZE / connections) % request_size != 0) { fprintf (stderr, "Invalid REQUEST_SIZE environment variable: %ld\n", request_size); exit (EXIT_FAILURE); } srand ((microtime () / MICROSECONDS) + getpid ()); /* Initialize the RAM disk with the initial data from * nbdkit-pattern-filter. */ ramdisk = malloc (EXPORTSIZE); if (ramdisk == NULL) { perror ("calloc"); exit (EXIT_FAILURE); } for (i = 0; i < EXPORTSIZE; i += 8) { uint64_t d = htobe64 (i); memcpy (&ramdisk[i], &d, sizeof d); } /* Start the worker threads. */ - for (i = 0; i < NR_THREADS; ++i) { + for (i = 0; i < connections; ++i) { status[i].i = i; - status[i].offset = i * EXPORTSIZE / NR_THREADS; - status[i].length = EXPORTSIZE / NR_THREADS; + status[i].offset = i * EXPORTSIZE / connections; + status[i].length = EXPORTSIZE / connections; status[i].status = 0; status[i].requests = 0; status[i].bytes_sent = status[i].bytes_received = 0; err = pthread_create (&threads[i], NULL, start_thread, &status[i]); if (err != 0) { errno = err; perror ("pthread_create"); exit (EXIT_FAILURE); } } /* Wait for the threads to exit. */ errors = 0; requests = 0; bytes_sent = bytes_received = 0; - for (i = 0; i < NR_THREADS; ++i) { + for (i = 0; i < connections; ++i) { err = pthread_join (threads[i], NULL); if (err != 0) { errno = err; perror ("pthread_join"); exit (EXIT_FAILURE); } if (status[i].status != 0) { fprintf (stderr, "thread %zu failed with status %d\n", i, status[i].status); errors++; } requests += status[i].requests; bytes_sent += status[i].bytes_sent; bytes_received += status[i].bytes_received; } free (ramdisk); /* Print some stats. */ printf ("TLS: %s\n", diff --git a/tests/synch-parallel.sh b/tests/synch-parallel.sh index 0ca9060..ae35dd1 100755 --- a/tests/synch-parallel.sh +++ b/tests/synch-parallel.sh @@ -1,28 +1,33 @@ #!/usr/bin/env bash # nbd client library in userspace # Copyright (C) 2019 Red Hat Inc. # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA # Test synchronous parallel high level API requests. -for request_size in 4096 262144; do - echo "Request size: $request_size" - REQUEST_SIZE=$request_size nbdkit -U - \ - --filter=cow \ - pattern size=8M \ - --run '$VG ./synch-parallel $unixsocket' - echo +for connections in 1 2 4 8; do + for request_size in 4096 262144; do + echo "Connections: $connections" + echo "Request size: $request_size" + CONNECTIONS=$connections \ + REQUEST_SIZE=$request_size \ + nbdkit -U - \ + --filter=cow \ + pattern size=8M \ + --run '$VG ./synch-parallel $unixsocket' + echo + done done -- 2.31.1
Richard W.M. Jones
2021-Nov-14 18:08 UTC
[Libguestfs] [PATCH libnbd 0/6] Enhance synch-parallel test
I'm lukewarm of this series, but I've got a few comments on it as well as some questions below. - In nbdkit we have a function tvdiff_usec (see common/include/tvdiff.h). Maybe we should use that instead of microtime? - Rather than having a single test that does multiple runs (and so runs for a very long time), it's better to have multiple tests because they can run in parallel. This is how it could be done (but see also my comment about benchmarks below). * Copy original tests/synch-parallel.sh to tests/synch-parallel-conn-1-request-4096.sh tests/synch-parallel-conn-1-request-262144.sh etc. (Or choose better names) * Each test does: CONNECTIONS=1 REQUEST_SIZE=4096 ./synch-parallel.sh (etc) * Original synch-parallel.sh is unchanged, but remove it from TESTS * Add the new scripts to TESTS On Sun, Nov 14, 2021 at 09:21:39AM +0200, Nir Soffer wrote:> I'm working on an application using the sync API. In my tests I see best read > throughput with 2 nbd connections which is unexpected.I'm interested in why you're using the synchronous API. I think it'll inevitably be slower than using the asynch API because you can never have multiple requests on a single TCP connection. ...> - The test scripts is much slower now (120 seconds instead of > 10). We need to to separate the benchmark, running many > combinations (synch-parallel-bench.sh) and the test, running one > combination (sync-parallel.sh).This is too long for tests. I think there are really two conflicting requirements - you want to benchmark the synchronous API, which is a worthwhile but separate goal from testing that things are working. Should we consider having benchmarks in a separate directory or even a separate git repo? (Separate repo is what I did for libguestfs). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW