I've recently been trying to track down the root cause of my server's persistent issue of thrashing horribly after being left inactive. It seems that the issue is likely my nightly backup schedule (using rsync) which traverses my entire 50GB home directory. I was surprised to find that rsync does not use fadvise to notify the kernel of its use-once data usage pattern. It looks like a patch[1] was written (although never merged, it seems) incorporating fadvise support, but I found its implementation rather odd, using mincore() and FADV_DONTNEED to kick out only regions brought in by rsync. It seemed to me the simpler and more appropriate solution would be to simply flag every touched file with FADV_NOREUSE and let the kernel manage automatically expelling used pages. After looking deeper into the kernel implementation[2] of fadvise() the reason for using DONTNEED became more apparant. It seems that the kernel implements NOREUSE as a noop. A little googling revealed[3] that I not the first person to encounter this limitation. It looks like a few folks[4] have discussed addressing the issue in the past, but nothing has happened as of 2.6.36. Are there plans to implement this functionality in the near future? It seems like the utility of fadvise is severely limited by lacking support for NOREUSE. Cheers, - Ben [1] http://insights.oetiker.ch/linux/fadvise.html [2] http://lxr.free-electrons.com/source/mm/fadvise.c?a=avr32 [3] https://issues.apache.org/jira/browse/CASSANDRA-1470 http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html [4] http://www.mail-archive.com/linux-kernel at vger.kernel.org/msg179576.html http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/0442.html
On Wed, Nov 3, 2010 at 10:58 PM, Ben Gamari <bgamari.foss at gmail.com> wrote:> It looks like a few folks have discussed addressing the issue in the past, > but nothing has happened as of 2.6.36.Yeah, the linux code for this has long been buggy and near useless. What is really needed is a way for some file access to be marked as generating low-priority cache data into the filesystem cache. Such a flag should only apply to newly cached data, so that copying a file that was cached by some other program would not lower its cache priority (nor kick it out of the cache). If some other process comes along and reads from the low-priority cache with a normal-priority read, it should get upgraded to normal priority. Something like that seems pretty simple and useful. As for rsync, all current implementations of cache dropping are way too klugey to go into rsync. I'd personally suggest that someone create a linux-specific pre-load library that overrides read() and write() calls and use that when running rsync (or whatever else) to implement the extreme weirdness that is currently needed for cache twiddling. ..wayne.. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20101106/8fa7c7fd/attachment.html>
On Tue, 9 Nov 2010 16:28:02 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro at jp.fujitsu.com> wrote:> So, I don't think application developers will use fadvise() aggressively > because we don't have a cross platform agreement of a fadvice behavior. >I strongly disagree. For a long time I have been trying to resolve interactivity issues caused by my rsync-based backup script. Many kernel developers have said that there is nothing the kernel can do without more information from user-space (e.g. cgroups, madvise). While cgroups help, the fix is round-about at best and requires configuration where really none should be necessary. The easiest solution for everyone involved would be for rsync to use FADV_DONTNEED. The behavior doesn't need to be perfectly consistent between platforms for the flag to be useful so long as each implementation does something sane to help use-once access patterns. People seem to mention frequently that there are no users of FADV_DONTNEED and therefore we don't need to implement it. It seems like this is ignoring an obvious catch-22. Currently rsync has no fadvise support at all, since using[1] the implemented hints to get the desired effect is far too complicated^M^M^M^Mhacky to be considered merge-worthy. Considering the number of Google hits returned for fadvise, I wouldn't be surprised if there were countless other projects with this same difficulty. We want to be able to tell the kernel about our useage patterns, but the kernel won't listen. Cheers, - Ben [1] http://insights.oetiker.ch/linux/fadvise.html
On Sun, 14 Nov 2010 14:09:29 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro at jp.fujitsu.com> wrote:> Because we have an alternative solution already. please try memcgroup :) >Alright, fair enough. It still seems like there are many cases where fadvise seems more appropriate, but memcg should at least satisfy my personal needs so I'll shut up now. Thanks! - Ben
On Mon, 15 Nov 2010 16:28:32 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro at jp.fujitsu.com> wrote:> Who can make rsync like io pattern test suite? a code change is easy. but > to comfirm justification is more harder work. >I'm afraid I don't have time to work up any code. I would be happy to try the patch with my backup use-case though. I'll just have to think of an objective way of measuring the result. - Ben
Reasonably Related Threads
- FADV_DONTNEED support
- [RFC PATCH] fadvise support in rsync
- Re: [PATCH nbdkit] file: Implement cache=none and fadvise=normal|random|sequential.
- [PATCH] Change the check for PageReadahead into an else-if
- [PATCH nbdkit] file: Implement cache=none and fadvise=normal|random|sequential.