Nir Soffer
2020-Aug-07 22:24 UTC
[Libguestfs] [PATCH nbdkit] plugins: file: More standard cache mode names
The new cache=none mode is misleading since it does not avoid usage of the page cache. When using shared storage, we may get stale data from the page cache. When writing, we flush after every write which is inefficient and unneeded. Rename the cache modes to: - writeback - write complete when the system call returned, and the data was copied to the page cache. - writethrough - write completes when the data reach storage. read marks pages as not needed to minimize use of the page cache. These terms are more aligned with other software like qemu or LIO and common caching concepts. [1] https://www.linuxjournal.com/article/7105 [2] https://www.qemu.org/docs/master/system/invocation.html (see cache=cache) Signed-off-by: Nir Soffer <nsoffer@redhat.com> --- plugins/file/file.c | 22 +++++++++++----------- plugins/file/nbdkit-file-plugin.pod | 19 +++++++++++++------ tests/test-gzip.c | 2 +- 3 files changed, 25 insertions(+), 18 deletions(-) diff --git a/plugins/file/file.c b/plugins/file/file.c index f6f91955..deac9b7f 100644 --- a/plugins/file/file.c +++ b/plugins/file/file.c @@ -76,7 +76,7 @@ static int fadvise_mode ; /* cache mode */ -static enum { cache_default, cache_none } cache_mode = cache_default; +static enum { cache_writeback, cache_writethrough } cache_mode = cache_writeback; /* Any callbacks using lseek must be protected by this lock. */ static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER; @@ -140,10 +140,10 @@ file_config (const char *key, const char *value) } } else if (strcmp (key, "cache") == 0) { - if (strcmp (value, "default") == 0) - cache_mode = cache_default; - else if (strcmp (value, "none") == 0) - cache_mode = cache_none; + if (strcmp (value, "writeback") == 0) + cache_mode = cache_writeback; + else if (strcmp (value, "writethrough") == 0) + cache_mode = cache_writethrough; else { nbdkit_error ("unknown cache mode: %s", value); return -1; @@ -423,7 +423,7 @@ file_pread (void *handle, void *buf, uint32_t count, uint64_t offset, #ifdef HAVE_POSIX_FADVISE /* On Linux this will evict the pages we just read from the page cache. */ - if (cache_mode == cache_none) + if (cache_mode == cache_writethrough) posix_fadvise (h->fd, orig_offset, orig_count, POSIX_FADV_DONTNEED); #endif @@ -441,11 +441,11 @@ file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, uint32_t orig_count = count; uint64_t orig_offset = offset; - /* If cache=none we want to force pages we have just written to the - * file to be flushed to disk so we can immediately evict them from - * the page cache. + /* If cache=writethrough we want to force pages we have just written + * to the file to be flushed to disk so we can immediately evict them + * from the page cache. */ - if (cache_mode == cache_none) flags |= NBDKIT_FLAG_FUA; + if (cache_mode == cache_writethrough) flags |= NBDKIT_FLAG_FUA; #endif while (count > 0) { @@ -464,7 +464,7 @@ file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, #ifdef HAVE_POSIX_FADVISE /* On Linux this will evict the pages we just wrote from the page cache. */ - if (cache_mode == cache_none) + if (cache_mode == cache_writethrough) posix_fadvise (h->fd, orig_offset, orig_count, POSIX_FADV_DONTNEED); #endif diff --git a/plugins/file/nbdkit-file-plugin.pod b/plugins/file/nbdkit-file-plugin.pod index e2ae0b1b..6a5bd088 100644 --- a/plugins/file/nbdkit-file-plugin.pod +++ b/plugins/file/nbdkit-file-plugin.pod @@ -5,7 +5,7 @@ nbdkit-file-plugin - nbdkit file plugin =head1 SYNOPSIS nbdkit file [file=]FILENAME - [cache=default|none] [fadvise=normal|random|sequential] + [cache=writeback|writethrough] [fadvise=normal|random|sequential] =head1 DESCRIPTION @@ -18,12 +18,19 @@ It serves the named C<FILENAME> over NBD. Local block devices =over 4 -=item B<cache=default> +=item B<cache=writeback> -=item B<cache=none> +=item B<cache=writethrough> -Using C<cache=none> tries to prevent the kernel from keeping parts of -the file that have already been read or written in the page cache. +Using C<cache=writeback>, write will be considered as completed as soon +as the the data is store in the kernel page cache, before it reached the +actual stoarge. + +Using C<cache=writethrough>, write will be considered as completed only +when the data reach actual storage, minimizing use of kernel page cache. +reading will try to minimize use of the page cache. + +The default is C<writeback>. =item B<fadvise=normal> @@ -73,7 +80,7 @@ the file sequentially one time (eg for making a single copy or backup) then this will stop other processes from being evicted from the page cache: - nbdkit file disk.img fadvise=sequential cache=none + nbdkit file disk.img fadvise=sequential cache=writethrough =head2 Files on tmpfs diff --git a/tests/test-gzip.c b/tests/test-gzip.c index 8f81c5b7..b1685098 100644 --- a/tests/test-gzip.c +++ b/tests/test-gzip.c @@ -62,7 +62,7 @@ main (int argc, char *argv[]) /* Test the new filter. */ if (test_start_nbdkit ("file", "--filter=gzip", disk, - "fadvise=sequential", "cache=none", + "fadvise=sequential", "cache=writethrough", NULL) == -1) exit (EXIT_FAILURE); do_test (); -- 2.25.4
Richard W.M. Jones
2020-Aug-08 21:28 UTC
Re: [Libguestfs] [PATCH nbdkit] plugins: file: More standard cache mode names
On Sat, Aug 08, 2020 at 01:24:02AM +0300, Nir Soffer wrote:> The new cache=none mode is misleading since it does not avoid usage of > the page cache. When using shared storage, we may get stale data from > the page cache. When writing, we flush after every write which is > inefficient and unneeded. >Hmm. This isn't actually what the cache=none parameter is currently doing. cache=none evicts pages from the cache "soon" after they are last needed by the plugin, whether that is for read or write. It's true that it does not completely avoid the page cache (so I agree it's not precisely like O_DIRECT), but it tries to minimize the time that pages spend there.> Rename the cache modes to: > - writeback - write complete when the system call returned, and the data > was copied to the page cache. > - writethrough - write completes when the data reach storage. read marks > pages as not needed to minimize use of the page cache.I'm not convinced that writethrough is the same as cache=none. In qemu it only applies to writes, and says nothing about reads AFAIK. Where do you get the information that writethrough (in qemu, or anywhere else) marks pages as not needed if they are only read and never written? I'm not convinced by this change. Rich.> These terms are more aligned with other software like qemu or LIO and > common caching concepts. > > [1] https://www.linuxjournal.com/article/7105 > [2] https://www.qemu.org/docs/master/system/invocation.html > (see cache=cache) > > Signed-off-by: Nir Soffer <nsoffer@redhat.com> > --- > plugins/file/file.c | 22 +++++++++++----------- > plugins/file/nbdkit-file-plugin.pod | 19 +++++++++++++------ > tests/test-gzip.c | 2 +- > 3 files changed, 25 insertions(+), 18 deletions(-) > > diff --git a/plugins/file/file.c b/plugins/file/file.c > index f6f91955..deac9b7f 100644 > --- a/plugins/file/file.c > +++ b/plugins/file/file.c > @@ -76,7 +76,7 @@ static int fadvise_mode > ; > > /* cache mode */ > -static enum { cache_default, cache_none } cache_mode = cache_default; > +static enum { cache_writeback, cache_writethrough } cache_mode = cache_writeback; > > /* Any callbacks using lseek must be protected by this lock. */ > static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER; > @@ -140,10 +140,10 @@ file_config (const char *key, const char *value) > } > } > else if (strcmp (key, "cache") == 0) { > - if (strcmp (value, "default") == 0) > - cache_mode = cache_default; > - else if (strcmp (value, "none") == 0) > - cache_mode = cache_none; > + if (strcmp (value, "writeback") == 0) > + cache_mode = cache_writeback; > + else if (strcmp (value, "writethrough") == 0) > + cache_mode = cache_writethrough; > else { > nbdkit_error ("unknown cache mode: %s", value); > return -1; > @@ -423,7 +423,7 @@ file_pread (void *handle, void *buf, uint32_t count, uint64_t offset, > > #ifdef HAVE_POSIX_FADVISE > /* On Linux this will evict the pages we just read from the page cache. */ > - if (cache_mode == cache_none) > + if (cache_mode == cache_writethrough) > posix_fadvise (h->fd, orig_offset, orig_count, POSIX_FADV_DONTNEED); > #endif > > @@ -441,11 +441,11 @@ file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, > uint32_t orig_count = count; > uint64_t orig_offset = offset; > > - /* If cache=none we want to force pages we have just written to the > - * file to be flushed to disk so we can immediately evict them from > - * the page cache. > + /* If cache=writethrough we want to force pages we have just written > + * to the file to be flushed to disk so we can immediately evict them > + * from the page cache. > */ > - if (cache_mode == cache_none) flags |= NBDKIT_FLAG_FUA; > + if (cache_mode == cache_writethrough) flags |= NBDKIT_FLAG_FUA; > #endif > > while (count > 0) { > @@ -464,7 +464,7 @@ file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, > > #ifdef HAVE_POSIX_FADVISE > /* On Linux this will evict the pages we just wrote from the page cache. */ > - if (cache_mode == cache_none) > + if (cache_mode == cache_writethrough) > posix_fadvise (h->fd, orig_offset, orig_count, POSIX_FADV_DONTNEED); > #endif > > diff --git a/plugins/file/nbdkit-file-plugin.pod b/plugins/file/nbdkit-file-plugin.pod > index e2ae0b1b..6a5bd088 100644 > --- a/plugins/file/nbdkit-file-plugin.pod > +++ b/plugins/file/nbdkit-file-plugin.pod > @@ -5,7 +5,7 @@ nbdkit-file-plugin - nbdkit file plugin > =head1 SYNOPSIS > > nbdkit file [file=]FILENAME > - [cache=default|none] [fadvise=normal|random|sequential] > + [cache=writeback|writethrough] [fadvise=normal|random|sequential] > > =head1 DESCRIPTION > > @@ -18,12 +18,19 @@ It serves the named C<FILENAME> over NBD. Local block devices > > =over 4 > > -=item B<cache=default> > +=item B<cache=writeback> > > -=item B<cache=none> > +=item B<cache=writethrough> > > -Using C<cache=none> tries to prevent the kernel from keeping parts of > -the file that have already been read or written in the page cache. > +Using C<cache=writeback>, write will be considered as completed as soon > +as the the data is store in the kernel page cache, before it reached the > +actual stoarge. > + > +Using C<cache=writethrough>, write will be considered as completed only > +when the data reach actual storage, minimizing use of kernel page cache. > +reading will try to minimize use of the page cache. > + > +The default is C<writeback>. > > =item B<fadvise=normal> > > @@ -73,7 +80,7 @@ the file sequentially one time (eg for making a single copy or backup) > then this will stop other processes from being evicted from the page > cache: > > - nbdkit file disk.img fadvise=sequential cache=none > + nbdkit file disk.img fadvise=sequential cache=writethrough > > =head2 Files on tmpfs > > diff --git a/tests/test-gzip.c b/tests/test-gzip.c > index 8f81c5b7..b1685098 100644 > --- a/tests/test-gzip.c > +++ b/tests/test-gzip.c > @@ -62,7 +62,7 @@ main (int argc, char *argv[]) > > /* Test the new filter. */ > if (test_start_nbdkit ("file", "--filter=gzip", disk, > - "fadvise=sequential", "cache=none", > + "fadvise=sequential", "cache=writethrough", > NULL) == -1) > exit (EXIT_FAILURE); > do_test (); > -- > 2.25.4-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Nir Soffer
2020-Aug-08 22:28 UTC
Re: [Libguestfs] [PATCH nbdkit] plugins: file: More standard cache mode names
On Sun, Aug 9, 2020 at 12:28 AM Richard W.M. Jones <rjones@redhat.com> wrote:> > On Sat, Aug 08, 2020 at 01:24:02AM +0300, Nir Soffer wrote: > > The new cache=none mode is misleading since it does not avoid usage of > > the page cache. When using shared storage, we may get stale data from > > the page cache. When writing, we flush after every write which is > > inefficient and unneeded. > > > > Hmm. This isn't actually what the cache=none parameter is currently > doing. > > cache=none evicts pages from the cache "soon" after they are last > needed by the plugin, whether that is for read or write. It's true > that it does not completely avoid the page cache (so I agree it's not > precisely like O_DIRECT), but it tries to minimize the time that pages > spend there.Yes, for this we flush after every write, and the common name for this is writethrough.> > Rename the cache modes to: > > - writeback - write complete when the system call returned, and the data > > was copied to the page cache. > > - writethrough - write completes when the data reach storage. read marks > > pages as not needed to minimize use of the page cache. > > I'm not convinced that writethrough is the same as cache=none. In > qemu it only applies to writes, and says nothing about reads AFAIK. > Where do you get the information that writethrough (in qemu, or > anywhere else) marks pages as not needed if they are only read and > never written?qemu does not use DONTNEED, but otherwise current cache=none is what qemu and LIO call writethrough. Using a cache and evicting the pages does not make much sense, since when reading the pages are not in the cache. So why do you use the cache? I think we are mixing different things in the current code. Flushing after every write is almost never needed, but if needed it should be possible to use it without fadvice=DONTNEED to keep a useful cache when reading. I think the DONTNEED flag should be separate from cache=writesthrough. In dd this can be done using iflag=nocache or oflag=nocache, and syncing after every write can be done using oflag=dsync/sync. If you want both you can use oflag=nocache,dsync.> I'm not convinced by this change.cache=none should mean no cache, and it should not require syncing after every write. What we have now is not useful. It could be useful if the kernel was doing smarter flushing when using DONTNEED, for example flushing the data asynchronically every few seconds, instead of writing gigabytes of data at once.> > Rich. > > > These terms are more aligned with other software like qemu or LIO and > > common caching concepts. > > > > [1] https://www.linuxjournal.com/article/7105 > > [2] https://www.qemu.org/docs/master/system/invocation.html > > (see cache=cache) > > > > Signed-off-by: Nir Soffer <nsoffer@redhat.com> > > --- > > plugins/file/file.c | 22 +++++++++++----------- > > plugins/file/nbdkit-file-plugin.pod | 19 +++++++++++++------ > > tests/test-gzip.c | 2 +- > > 3 files changed, 25 insertions(+), 18 deletions(-) > > > > diff --git a/plugins/file/file.c b/plugins/file/file.c > > index f6f91955..deac9b7f 100644 > > --- a/plugins/file/file.c > > +++ b/plugins/file/file.c > > @@ -76,7 +76,7 @@ static int fadvise_mode > > ; > > > > /* cache mode */ > > -static enum { cache_default, cache_none } cache_mode = cache_default; > > +static enum { cache_writeback, cache_writethrough } cache_mode = cache_writeback; > > > > /* Any callbacks using lseek must be protected by this lock. */ > > static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER; > > @@ -140,10 +140,10 @@ file_config (const char *key, const char *value) > > } > > } > > else if (strcmp (key, "cache") == 0) { > > - if (strcmp (value, "default") == 0) > > - cache_mode = cache_default; > > - else if (strcmp (value, "none") == 0) > > - cache_mode = cache_none; > > + if (strcmp (value, "writeback") == 0) > > + cache_mode = cache_writeback; > > + else if (strcmp (value, "writethrough") == 0) > > + cache_mode = cache_writethrough; > > else { > > nbdkit_error ("unknown cache mode: %s", value); > > return -1; > > @@ -423,7 +423,7 @@ file_pread (void *handle, void *buf, uint32_t count, uint64_t offset, > > > > #ifdef HAVE_POSIX_FADVISE > > /* On Linux this will evict the pages we just read from the page cache. */ > > - if (cache_mode == cache_none) > > + if (cache_mode == cache_writethrough) > > posix_fadvise (h->fd, orig_offset, orig_count, POSIX_FADV_DONTNEED); > > #endif > > > > @@ -441,11 +441,11 @@ file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, > > uint32_t orig_count = count; > > uint64_t orig_offset = offset; > > > > - /* If cache=none we want to force pages we have just written to the > > - * file to be flushed to disk so we can immediately evict them from > > - * the page cache. > > + /* If cache=writethrough we want to force pages we have just written > > + * to the file to be flushed to disk so we can immediately evict them > > + * from the page cache. > > */ > > - if (cache_mode == cache_none) flags |= NBDKIT_FLAG_FUA; > > + if (cache_mode == cache_writethrough) flags |= NBDKIT_FLAG_FUA; > > #endif > > > > while (count > 0) { > > @@ -464,7 +464,7 @@ file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, > > > > #ifdef HAVE_POSIX_FADVISE > > /* On Linux this will evict the pages we just wrote from the page cache. */ > > - if (cache_mode == cache_none) > > + if (cache_mode == cache_writethrough) > > posix_fadvise (h->fd, orig_offset, orig_count, POSIX_FADV_DONTNEED); > > #endif > > > > diff --git a/plugins/file/nbdkit-file-plugin.pod b/plugins/file/nbdkit-file-plugin.pod > > index e2ae0b1b..6a5bd088 100644 > > --- a/plugins/file/nbdkit-file-plugin.pod > > +++ b/plugins/file/nbdkit-file-plugin.pod > > @@ -5,7 +5,7 @@ nbdkit-file-plugin - nbdkit file plugin > > =head1 SYNOPSIS > > > > nbdkit file [file=]FILENAME > > - [cache=default|none] [fadvise=normal|random|sequential] > > + [cache=writeback|writethrough] [fadvise=normal|random|sequential] > > > > =head1 DESCRIPTION > > > > @@ -18,12 +18,19 @@ It serves the named C<FILENAME> over NBD. Local block devices > > > > =over 4 > > > > -=item B<cache=default> > > +=item B<cache=writeback> > > > > -=item B<cache=none> > > +=item B<cache=writethrough> > > > > -Using C<cache=none> tries to prevent the kernel from keeping parts of > > -the file that have already been read or written in the page cache. > > +Using C<cache=writeback>, write will be considered as completed as soon > > +as the the data is store in the kernel page cache, before it reached the > > +actual stoarge. > > + > > +Using C<cache=writethrough>, write will be considered as completed only > > +when the data reach actual storage, minimizing use of kernel page cache. > > +reading will try to minimize use of the page cache. > > + > > +The default is C<writeback>. > > > > =item B<fadvise=normal> > > > > @@ -73,7 +80,7 @@ the file sequentially one time (eg for making a single copy or backup) > > then this will stop other processes from being evicted from the page > > cache: > > > > - nbdkit file disk.img fadvise=sequential cache=none > > + nbdkit file disk.img fadvise=sequential cache=writethrough > > > > =head2 Files on tmpfs > > > > diff --git a/tests/test-gzip.c b/tests/test-gzip.c > > index 8f81c5b7..b1685098 100644 > > --- a/tests/test-gzip.c > > +++ b/tests/test-gzip.c > > @@ -62,7 +62,7 @@ main (int argc, char *argv[]) > > > > /* Test the new filter. */ > > if (test_start_nbdkit ("file", "--filter=gzip", disk, > > - "fadvise=sequential", "cache=none", > > + "fadvise=sequential", "cache=writethrough", > > NULL) == -1) > > exit (EXIT_FAILURE); > > do_test (); > > -- > > 2.25.4 > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > libguestfs lets you edit virtual machines. Supports shell scripting, > bindings from many languages. http://libguestfs.org >
Reasonably Related Threads
- Re: [PATCH nbdkit] plugins: file: More standard cache mode names
- Re: [PATCH nbdkit] plugins: file: More standard cache mode names
- [PATCH nbdkit] file: Implement cache=none and fadvise=normal|random|sequential.
- Re: [PATCH nbdkit] file: Implement cache=none and fadvise=normal|random|sequential.
- Re: [PATCH nbdkit] file: Implement cache=none and fadvise=normal|random|sequential.