Dan Magenheimer
2010-Aug-20 15:14 UTC
[Ocfs2-devel] cleancache followup from LSF10/MM summit
Hi Christophe (and others interested in cleancache progress) -- Thanks for taking some time to talk with me about cleancache at LSF summit! You had some interesting thoughts and suggestions that I said I would investigate. They are: 1) use inode kva as key instead of i_ino 2) eliminate cleancache shim and call zcache directly 3) fs's requiring key > inode_t (e.g. 64-bit-fs on 32-bit-kernel) 4) eliminate fs-specific code entirely (e.g. "opt-in") 5) eliminate global variable Here's my conclusions: 1) You suggested using the inode kva as a "key" for cleancache. I think your goal was to make it more fs-independent and also to eliminate the need for using a per-fs enabler and "pool id". I looked at this but it will not work because cleancache retains page cache data pages persistently even when the inode has been pruned from the inode_unused_list and only flushes the data pages if the file gets removed/truncated. If cleancache used the inode kva, there would be coherency issues when the inode kva is reused. Alternately, if cleancache flushed the pages when the inode kva was freed, much of the value of cleancache would be lost because the cache of pages in cleancache is potentially much larger than the page cache and is most useful if the pages survive inode cache removal. If I misunderstood your proposal or if you disagree, please let me know. 2) You suggested eliminating the cleancache shim layer and just directly calling zcache, effectively eliminating Xen as a user. During and after LSF summit, I talked to developers from Google who are interested in investigating the cleancache interface for use with cgroups, an IBM developer who was interested in cleancache for optimizing NUMA, and soon I will be talking to HP Labs about using it as an interface for "memory blades". I also think Rik van Riel and Mel Gorman were intrigued about its use for collecting better memory utilization statistics to drive guest/host memory "rightsizing". While it is true that none of these are current users yet, even if you prefer to ignore Xen tmem as a user, it seems silly to throw away the cleanly-layered generic cleancache interface now, only to add it back later when more users are added. 3) You re-emphasized the problem where cleancache's use of the inode number as a key will cause problems on many 64-bit filesystems especially running on a 32-bit kernel. With help from Andreas Dilger, I'm trying to work out a generic solution for this using s_export_op->encode_fh which would be used for any fs that provides it to guarantee a unique multi-word key for a file, while preserving the shorter i_ino as a key for fs's for which i_ino is unique. 4) Though you were out of the room during the cleancache lightning talk, other filesystem developers seemed OK with the "opt-in" approach (as documented in lwn.net)... one even asked "can't you just add a bit to the superblock?" to which I answered "that's essentially what the one line opt-in addition does". Not sure if you are still objecting to that, but especially given that the 64-bit-fs-on 32-bit-kernel issue above only affects some filesystems, I'm still thinking it is necessary. 5) You commented (before LSF) that the global variable should be avoided which is certainly valid, and I will try Nitin's suggestion to add a registration interface. Did I miss anything? I plan to submit a V4 for cleancache soon, and hope you will be inclined to ack this time. Thanks, Dan
* dan.magenheimer at oracle.com <dan.magenheimer at oracle.com> [2010-08-20 08:14:59]:> Hi Christophe (and others interested in cleancache progress) -- > > Thanks for taking some time to talk with me about cleancache > at LSF summit! You had some interesting thoughts and suggestions > that I said I would investigate. They are: > > 1) use inode kva as key instead of i_ino > 2) eliminate cleancache shim and call zcache directly > 3) fs's requiring key > inode_t (e.g. 64-bit-fs on 32-bit-kernel) > 4) eliminate fs-specific code entirely (e.g. "opt-in") > 5) eliminate global variable > > Here's my conclusions: > > 1) You suggested using the inode kva as a "key" for cleancache. > I think your goal was to make it more fs-independent and also > to eliminate the need for using a per-fs enabler and "pool id". > I looked at this but it will not work because cleancache > retains page cache data pages persistently even when the > inode has been pruned from the inode_unused_list and only > flushes the data pages if the file gets removed/truncated. If > cleancache used the inode kva, there would be coherency issues > when the inode kva is reused. Alternately, if cleancache > flushed the pages when the inode kva was freed, much of > the value of cleancache would be lost because the cache > of pages in cleancache is potentially much larger than > the page cache and is most useful if the pages survive > inode cache removal. > > If I misunderstood your proposal or if you disagree, please > let me know. > > 2) You suggested eliminating the cleancache shim layer and just > directly calling zcache, effectively eliminating Xen as > a user. During and after LSF summit, I talked to developers > from Google who are interested in investigating the cleancache > interface for use with cgroups, an IBM developer who was > interested in cleancache for optimizing NUMA, and soon I > will be talking to HP Labs about using it as an interface > for "memory blades". I also think Rik van Riel and Mel Gorman > were intrigued about its use for collecting better memory > utilization statistics to drive guest/host memory "rightsizing". > While it is true that none of these are current users yet, even > if you prefer to ignore Xen tmem as a user, it seems silly to > throw away the cleanly-layered generic cleancache interface now, > only to add it back later when more users are added. > > 3) You re-emphasized the problem where cleancache's use of > the inode number as a key will cause problems on many 64-bit > filesystems especially running on a 32-bit kernel. With > help from Andreas Dilger, I'm trying to work out a generic > solution for this using s_export_op->encode_fh which would > be used for any fs that provides it to guarantee a unique > multi-word key for a file, while preserving the > shorter i_ino as a key for fs's for which i_ino is unique. > > 4) Though you were out of the room during the cleancache > lightning talk, other filesystem developers seemed OK > with the "opt-in" approach (as documented in lwn.net)... > one even asked "can't you just add a bit to the superblock?" > to which I answered "that's essentially what the one > line opt-in addition does". Not sure if you are still > objecting to that, but especially given that the 64-bit-fs-on > 32-bit-kernel issue above only affects some filesystems, > I'm still thinking it is necessary. > > 5) You commented (before LSF) that the global variable should > be avoided which is certainly valid, and I will try Nitin's > suggestion to add a registration interface. > > Did I miss anything? > > I plan to submit a V4 for cleancache soon, and hope you will > be inclined to ack this time. >Hi, Dan, Sorry for commenting on your post so late. I've had some time to read through your approach and compare it to my approach (linuxsymposium.org/2010/view_abstract.php?content_key=32) and I had a few quick questions 1. Can't this be done at the MM layer - why the filesystem hooks? Is it to enable faster block devices in the reclaim hierarchy? 2. I don't see a mention of slabcache in your approach, reclaim free pages or freeing potentially free slab pages. -- Three Cheers, Balbir
Dan Magenheimer
2010-Aug-24 20:42 UTC
[Ocfs2-devel] cleancache followup from LSF10/MM summit
Hi Balbir -- Thanks for reviewing!> 1. Can't this be done at the MM layer - why the filesystem hooks? Is > it to enable faster block devices in the reclaim hierarchy?This is explained in FAQ #2 in: lkml.org/lkml/2010/6/21/411 If I misunderstood your question or the FAQ doesn't answer it, please let me know.> 2. I don't see a mention of slabcache in your approach, reclaim free > pages or freeing potentially free slab pages.Cleancache works on clean mapped pages that are reclaimed ("evicted") due to (guest) memory pressure but later would result in a refault. The decision of what pages to reclaim are left entirely to the (guest) kernel, and the "backend" (zcache or Xen tmem) dynamically decides how many clean evicted pages to retain based on dynamic factors that are unknowable to the (guest) kernel (such as compression ratios for zcache and available fallow memory for Xen tmem). I'm not sure I see how this could apply to slabcache (and I couldn't find anything in your OLS paper that refers to it), but if you have some ideas, let's discuss (offlist?). Thanks, Dan