Richard W.M. Jones
2021-Feb-23 22:51 UTC
[Libguestfs] [PATCH libnbd] copy: Preserve the host page cache when writing to local files.
This is the equivalent patch for writing, using Linus's recommended technique. It's actually not as bad -- codewise -- as I thought it would be, but there's a significant performance penalty (25%) still which might be addressed by using more than two write windows. Rich.
Richard W.M. Jones
2021-Feb-23 22:51 UTC
[Libguestfs] [PATCH libnbd] copy: Preserve the host page cache when writing to local files.
This uses Linus's technique described here: https://stackoverflow.com/a/3756466 Before this commit: $ rm /var/tmp/pattern $ time ./run nbdcopy [ nbdkit pattern 32G ] /var/tmp/pattern real 0m32.148s user 0m18.015s sys 0m33.216s $ cachestats /var/tmp/pattern pages in cache: 7066239/8388608 (84.2%) [filesize=33554432.0K, pagesize=4K] Notice that the newly written file ends up in the cache, thus trashing the page cache on the host. After this commit: $ rm /var/tmp/pattern $ time ./run nbdcopy [ nbdkit pattern 32G ] /var/tmp/pattern real 0m39.961s user 0m19.117s sys 0m41.238s $ cachestats /var/tmp/pattern pages in cache: 8192/8388608 (0.1%) [filesize=33554432.0K, pagesize=4K] The newly written file does not disturb the page cache. However there is about a 25% slow down. --- copy/file-ops.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/copy/file-ops.c b/copy/file-ops.c index 888a388..d285cb9 100644 --- a/copy/file-ops.c +++ b/copy/file-ops.c @@ -159,6 +159,36 @@ page_cache_evict (struct rw_file *rwf, uint64_t orig_offset, size_t orig_len) len -= n; } } + +/* Prepare to evict file contents from the page cache when writing. + * We cannot do this directly (as for reads above) because we have to + * wait for Linux to finish writing the pages to disk. Therefore the + * strategy is to (1) tell Linux to begin writing asynchronously, (2) + * evict the previous pages, which have hopefully been written already + * by the time we get here. We have to maintain a window per thread. + * + * For more information see https://stackoverflow.com/a/3756466 and + * the links to Linus's advice from that entry. + */ +static inline void +page_cache_evict_for_writes (struct rw_file *rwf, uint64_t offset, size_t len) +{ + static __thread uint64_t prev_offset; + static __thread size_t prev_len; + + /* Evict the previous window. */ + if (prev_len > 0) { + sync_file_range (rwf->fd, prev_offset, prev_len, + SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| + SYNC_FILE_RANGE_WAIT_AFTER); + posix_fadvise (rwf->fd, prev_offset, prev_len, POSIX_FADV_DONTNEED); + } + + /* Set up the current window. */ + sync_file_range (rwf->fd, offset, len, SYNC_FILE_RANGE_WRITE); + prev_offset = offset; + prev_len = len; +} #endif static bool @@ -365,6 +395,10 @@ file_synch_write (struct rw *rw, const void *data, size_t len, uint64_t offset) { struct rw_file *rwf = (struct rw_file *)rw; +#ifdef PAGE_CACHE_MAPPING + const uint64_t orig_offset = offset; + const size_t orig_len = len; +#endif ssize_t r; while (len > 0) { @@ -377,6 +411,10 @@ file_synch_write (struct rw *rw, offset += r; len -= r; } + +#if PAGE_CACHE_MAPPING + page_cache_evict_for_writes (rwf, orig_offset, orig_len); +#endif } static inline bool -- 2.29.0.rc2