Eric Blake
2021-Feb-19 19:58 UTC
[Libguestfs] [PATCH libnbd] copy: Set POSIX_FADV_SEQUENTIAL and POSIX_FADV_DONTNEED.
On 2/19/21 1:41 PM, Richard W.M. Jones wrote:>>> +#if defined (HAVE_POSIX_FADVISE) && defined (POSIX_FADV_DONTNEED) >>> + /* On Linux this will evict the pages we just read from the page cache. */ >>> + posix_fadvise (fd, offset, r, POSIX_FADV_DONTNEED); >> >> I don't think this is a good idea, since this affects the current page >> cache, for >> the entire host. >> >> So if the host is having an image in cache for good reason, running nbdcopy >> will drop the cache since nbdcopy does not need it, but maybe the host will >> need that cache after running nbdcopy. >> >> The right way to avoid polluting the page cache is to bypass the cache using >> O_DIRECT, so nbdcopy is not using or affecting the page cache. >> >> This can be useful if the user can enable this with a flag. > > The trouble with O_DIRECT is it's a pain to use correctly, and I guess > doesn't use readahead or the existing cache (kind of the opposite > problem). > > However I take your point that we probably ought not to remove files > from the cache that are already there. I somehow thought that > POSIX_FADV_DONTNEED only affected the current 'fd', but looking at the > Linux impl I see it affects the whole system. > > This argues for having a flag to enable this.If I understand correctly, unconditionally attempting FADV_SEQUENTIAL should always should be fine both for reads (from existing file) and writes (when copying to a new file), so its only FADV_DONTNEED for clearing out the cache that needs a flag? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2021-Feb-19 21:36 UTC
[Libguestfs] [PATCH libnbd] copy: Set POSIX_FADV_SEQUENTIAL and POSIX_FADV_DONTNEED.
On Fri, Feb 19, 2021 at 01:58:04PM -0600, Eric Blake wrote:> On 2/19/21 1:41 PM, Richard W.M. Jones wrote: > > >>> +#if defined (HAVE_POSIX_FADVISE) && defined (POSIX_FADV_DONTNEED) > >>> + /* On Linux this will evict the pages we just read from the page cache. */ > >>> + posix_fadvise (fd, offset, r, POSIX_FADV_DONTNEED); > >> > >> I don't think this is a good idea, since this affects the current > >> page cache, for the entire host. > >> > >> So if the host is having an image in cache for good reason, > >> running nbdcopy will drop the cache since nbdcopy does not need > >> it, but maybe the host will need that cache after running > >> nbdcopy. > >> > >> The right way to avoid polluting the page cache is to bypass the > >> cache using O_DIRECT, so nbdcopy is not using or affecting the > >> page cache. This can be useful if the user can enable this with > >> a flag.After thinking about this a bit more I'm not too convinced this would affect many people. It would require a situation where multiple processes on the host are all reading the same file that is also being processed by nbdcopy. It's possible, but for RHV/OpenStack/etc there are lots of hypervisors running but generally confined to their own sets of disks.> > The trouble with O_DIRECT is it's a pain to use correctly, and I guess > > doesn't use readahead or the existing cache (kind of the opposite > > problem). > > > > However I take your point that we probably ought not to remove files > > from the cache that are already there. I somehow thought that > > POSIX_FADV_DONTNEED only affected the current 'fd', but looking at the > > Linux impl I see it affects the whole system. > > > > This argues for having a flag to enable this.BTW it's extremely hairy, racy and probably slow, but it seems possible using mmap + mincore to determine which pages are resident, and so in theory avoid using POSIX_FADV_DONTNEED for those pages.> If I understand correctly, unconditionally attempting FADV_SEQUENTIAL > should always should be fine both for reads (from existing file) and > writes (when copying to a new file), so its only FADV_DONTNEED for > clearing out the cache that needs a flag?Looking at the code, FADV_SEQUENTIAL just doubles the size of readahead: https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84c004b/mm/fadvise.c#L91 I assume it's only used for reads and would not affect writes, but it's hard to follow the code. I'm sure it should be safe though. So yes FADV_DONTNEED seems like it's the only one that would need a command line option to enable or disable. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org