Richard W.M. Jones
2021-Feb-23 17:28 UTC
[Libguestfs] [PATCH libnbd 0/2] copy: Preserve the host page cache when reading local files.
In nbdcopy we can preserve the page cache while reading from local files. This means (unlike O_DIRECT) using the page cache to our advantage when a file is already present in memory. But also not increasing the amount of file which is cached as we read it, or at least, not by very much. These two patches are an evolution of this patch: https://listman.redhat.com/archives/libguestfs/2021-February/thread.html#00036 Although the code is heavily conditional and so will only work on 64 bit Linux systems, I didn't bother adding a command line flag because I feel the way this is written (modulo bugs) it should almost always be advantageous. Writes next -- but that's much more difficult. Rich.
Richard W.M. Jones
2021-Feb-23 17:28 UTC
[Libguestfs] [PATCH libnbd 1/2] copy: Set POSIX_FADV_SEQUENTIAL flag on files.
On Linux this doubles the readahead. Even though we are not strictly speaking going to read or write the file sequentially (only mostly sequentially) it may provide some minor benefit. --- configure.ac | 3 +++ copy/file-ops.c | 7 +++++++ 2 files changed, 10 insertions(+) diff --git a/configure.ac b/configure.ac index 5723912..fb72953 100644 --- a/configure.ac +++ b/configure.ac @@ -98,6 +98,9 @@ AC_CHECK_HEADERS([\ AC_CHECK_HEADERS([linux/vm_sockets.h], [], [], [#include <sys/socket.h>]) +dnl posix_fadvise helps to optimise linear reads and writes (optional). +AC_CHECK_FUNCS([posix_fadvise]) + dnl Check for strerrordesc_np (optional, glibc only). dnl Prefer this over sys_errlist. dnl https://lists.fedoraproject.org/archives/list/glibc at lists.fedoraproject.org/thread/WJHGG2OO7ABNAYICGA5WQZ2Q34Q2FEHU/ diff --git a/copy/file-ops.c b/copy/file-ops.c index 1d7e6a6..73cbdcb 100644 --- a/copy/file-ops.c +++ b/copy/file-ops.c @@ -116,6 +116,13 @@ file_create (const char *name, int fd, off_t st_size, bool is_block) rwf->can_fallocate = true; } + /* Set the POSIX_FADV_SEQUENTIAL flag on the file descriptor, but + * don't fail. + */ +#if defined (HAVE_POSIX_FADVISE) && defined (POSIX_FADV_SEQUENTIAL) + posix_fadvise (fd, 0, 0, POSIX_FADV_SEQUENTIAL); +#endif + return &rwf->rw; } -- 2.29.0.rc2
Richard W.M. Jones
2021-Feb-23 17:28 UTC
[Libguestfs] [PATCH libnbd 2/2] copy: Preserve the host page cache when reading from local files.
When reading from a local file we can take advantage of the page cache (ie. not having to read the file from disk if a copy is present in memory), while at the same time not disturbing the state of the page cache. Disturbing the page cache can have bad consequences for other processes running on the host since they will have their working set evicted so it is something we should generally avoid. This requires Linux APIs, using the technique described here: https://insights.oetiker.ch/linux/fadvise/ This change only affects reads, since doing the same for writes is even more complicated. You can see the effect using the tools from https://github.com/Feh/nocache Before this change: $ cachestats /var/tmp/random pages in cache: 680768/8388608 (8.1%) [filesize=33554432.0K, pagesize=4K] $ ./run time nbdcopy /var/tmp/random null: 2.18user 27.00system 0:21.22elapsed 137%CPU (0avgtext+0avgdata 135276maxresident)k 61663216inputs+1800outputs (6major+8422398minor)pagefaults 0swaps $ cachestats /var/tmp/random pages in cache: 4892325/8388608 (58.3%) [filesize=33554432.0K, pagesize=4K] Notice that a large part of the file has been loaded into the page cache after the run. After this change: $ cachestats /var/tmp/random pages in cache: 611006/8388608 (7.3%) [filesize=33554432.0K, pagesize=4K] $ ./run time nbdcopy /var/tmp/random null: 1.77user 31.49system 0:20.79elapsed 159%CPU (0avgtext+0avgdata 144404maxresident)k 62751760inputs+0outputs (0major+8394000minor)pagefaults 0swaps $ cachestats /var/tmp/random pages in cache: 680768/8388608 (8.1%) [filesize=33554432.0K, pagesize=4K] Notice there is only a small increase in the amount of file which is cached, the elapsed time is about the same, but there is an increase in %CPU and system time (presumably the overhead of POSIX_FADV_DONTNEED). --- copy/file-ops.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++- copy/main.c | 2 +- copy/nbdcopy.h | 2 +- 3 files changed, 116 insertions(+), 4 deletions(-) diff --git a/copy/file-ops.c b/copy/file-ops.c index 73cbdcb..13c8c19 100644 --- a/copy/file-ops.c +++ b/copy/file-ops.c @@ -24,8 +24,10 @@ #include <fcntl.h> #include <unistd.h> #include <errno.h> +#include <limits.h> #include <sys/ioctl.h> #include <sys/types.h> +#include <sys/mman.h> #include <pthread.h> @@ -34,8 +36,36 @@ #endif #include "isaligned.h" +#include "rounding.h" + #include "nbdcopy.h" +/* If we are going to attempt page cache mapping which tries not to + * disturb the page cache when reading a file. Only do this on Linux + * systems where we understand how the page cache behaves. Since we + * need to mmap the whole file, also restrict this to 64 bit systems. + */ +#ifdef __linux__ +#ifdef __SIZEOF_POINTER__ +#if __SIZEOF_POINTER__ == 8 +#define PAGE_CACHE_MAPPING 1 +#endif +#endif +#endif + +#ifdef PAGE_CACHE_MAPPING +DEFINE_VECTOR_TYPE (byte_vector, uint8_t) + +static long page_size; + +static void page_size_init (void) __attribute__((constructor)); +static void +page_size_init (void) +{ + page_size = sysconf (_SC_PAGE_SIZE); +} +#endif + static struct rw_ops file_ops; struct rw_file { @@ -50,6 +80,10 @@ struct rw_file { * the working method. */ bool can_punch_hole, can_zero_range, can_fallocate, can_zeroout; + +#ifdef PAGE_CACHE_MAPPING + byte_vector cached_pages; +#endif }; static bool @@ -64,7 +98,8 @@ seek_hole_supported (int fd) } struct rw * -file_create (const char *name, int fd, off_t st_size, bool is_block) +file_create (const char *name, int fd, + off_t st_size, bool is_block, direction d) { struct rw_file *rwf = calloc (1, sizeof *rwf); if (rwf == NULL) { perror ("calloc"); exit (EXIT_FAILURE); } @@ -123,6 +158,28 @@ file_create (const char *name, int fd, off_t st_size, bool is_block) posix_fadvise (fd, 0, 0, POSIX_FADV_SEQUENTIAL); #endif +#if PAGE_CACHE_MAPPING + /* If reading, map which pages of the input file are currently + * stored in the page cache. + */ + if (d == READING) { + const int64_t filelen = rwf->rw.size; + + if (filelen > 0) { + void *ptr = mmap (NULL, filelen, PROT_READ, MAP_PRIVATE, fd, 0); + if (ptr != (void *)-1) { + const size_t veclen = ROUND_UP (filelen, page_size) / page_size; + + if (byte_vector_reserve (&rwf->cached_pages, veclen) != -1) { + if (mincore (ptr, filelen, rwf->cached_pages.ptr) != -1) + rwf->cached_pages.size = veclen; + } + munmap (ptr, filelen); + } + } + } +#endif + return &rwf->rw; } @@ -135,6 +192,11 @@ file_close (struct rw *rw) fprintf (stderr, "%s: close: %m\n", rw->name); exit (EXIT_FAILURE); } + +#ifdef PAGE_CACHE_MAPPING + byte_vector_reset (&rwf->cached_pages); +#endif + free (rw); } @@ -206,16 +268,38 @@ file_start_multi_conn (struct rw *rw) */ } +#ifdef PAGE_CACHE_MAPPING +/* Test if a single page of the file was cached before nbdcopy ran. */ +static inline bool +page_was_cached (struct rw_file *rwf, uint64_t offset) +{ + uint64_t page = offset / page_size; + if (page < rwf->cached_pages.size) + return (rwf->cached_pages.ptr[page] & 1) != 0; + else + /* This path is taken if we didn't manage to map the input file + * for any reason. In this case assume that pages were mapped so + * we will not evict them: essentially fall back to doing nothing. + */ + return true; +} +#endif + static size_t file_synch_read (struct rw *rw, void *data, size_t len, uint64_t offset) { struct rw_file *rwf = (struct rw_file *)rw; +#ifdef PAGE_CACHE_MAPPING + const uint64_t orig_offset = offset; + const size_t orig_len = len; +#endif + const int fd = rwf->fd; size_t n = 0; ssize_t r; while (len > 0) { - r = pread (rwf->fd, data, len, offset); + r = pread (fd, data, len, offset); if (r == -1) { perror (rw->name); exit (EXIT_FAILURE); @@ -229,6 +313,34 @@ file_synch_read (struct rw *rw, n += r; } +#if PAGE_CACHE_MAPPING + /* Evict file contents from the page cache if they were not present + * in the page cache before. + */ + if (rwf->cached_pages.size > 0) { + /* Only bother with whole pages. */ + offset = ROUND_UP (orig_offset, page_size); + len = orig_len; + len -= offset - orig_offset; + len = ROUND_DOWN (len, page_size); + + while (len > 0) { + if (! page_was_cached (rwf, offset)) { + /* Try to evict runs of pages in one go. */ + uint64_t n = page_size; + + while (len-n > 0 && ! page_was_cached (rwf, offset + n)) + n += page_size; + + posix_fadvise (fd, offset, n, POSIX_FADV_DONTNEED); + } + + offset += n; + len -= n; + } + } +#endif + return n; } diff --git a/copy/main.c b/copy/main.c index 3c574df..55c2b53 100644 --- a/copy/main.c +++ b/copy/main.c @@ -461,7 +461,7 @@ open_local (const char *filename, direction d) exit (EXIT_FAILURE); } if (S_ISBLK (stat.st_mode) || S_ISREG (stat.st_mode)) - return file_create (filename, fd, stat.st_size, S_ISBLK (stat.st_mode)); + return file_create (filename, fd, stat.st_size, S_ISBLK (stat.st_mode), d); else { /* Probably stdin/stdout, a pipe or a socket. */ synchronous = true; /* Force synchronous mode for pipes. */ diff --git a/copy/nbdcopy.h b/copy/nbdcopy.h index 4496722..e4c3d4e 100644 --- a/copy/nbdcopy.h +++ b/copy/nbdcopy.h @@ -52,7 +52,7 @@ typedef enum { READING, WRITING } direction; /* Create subtypes. */ extern struct rw *file_create (const char *name, int fd, - off_t st_size, bool is_block); + off_t st_size, bool is_block, direction d); extern struct rw *nbd_rw_create_uri (const char *name, const char *uri, direction d); extern struct rw *nbd_rw_create_subprocess (const char **argv, size_t argc, -- 2.29.0.rc2