Dan Magenheimer
2011-Jun-01 22:45 UTC
[Ocfs2-devel] bug in cleancache ocfs2 hook, anybody want to try cleancache?
As Steven Whitehouse points out in this lkml thread: https://lkml.org/lkml/2011/5/27/221 there is a bug in the ocfs2 hook to cleancache. The fix is fairly trivial, but I wonder if anyone in the ocfs2 community might be interested in trying out cleancache to author and test the fix? Currently, the only implementation that benefits from the sharing functionality is on Xen. So if you know how to (or are interested in learning how to) bring up multiple ocfs2 cluster nodes sharing an ocfs2 filesystem on top of Xen and you are interested in giving cleancache a spin, please let me know. Else I will probably push the fix myself. Dan P.S. Links to cleancache info: http://www.phoronix.com/scan.php?page=news_item&px=OTQ5Mw http://lwn.net/Articles/386090/ http://blogs.oracle.com/wim/entry/another_feature_hit_mainline_linux --- Thanks... for the memory! I really could use more / my throughput's on the floor The balloon is flat / my swap disk's fat / I've OOM's in store Overcommitted so much (with apologies to Bob Hope)
Steven Whitehouse
2011-Jun-02 08:45 UTC
[Ocfs2-devel] bug in cleancache ocfs2 hook, anybody want to try cleancache?
Hi, On Wed, 2011-06-01 at 15:45 -0700, Dan Magenheimer wrote:> As Steven Whitehouse points out in this lkml thread: > https://lkml.org/lkml/2011/5/27/221 > there is a bug in the ocfs2 hook to cleancache. > The fix is fairly trivial, but I wonder if anyone > in the ocfs2 community might be interested in trying > out cleancache to author and test the fix? > > Currently, the only implementation that benefits from > the sharing functionality is on Xen. > > So if you know how to (or are interested in learning > how to) bring up multiple ocfs2 cluster nodes sharing > an ocfs2 filesystem on top of Xen and you are interested > in giving cleancache a spin, please let me know. Else > I will probably push the fix myself. > > Dan >Having started looking at the cleancache code in a bit more detail, I have another question... what is the intended mechanism for selecting a cleancache backend? The registration code looks like this: struct cleancache_ops cleancache_register_ops(struct cleancache_ops *ops) { struct cleancache_ops old = cleancache_ops; cleancache_ops = *ops; cleancache_enabled = 1; return old; } EXPORT_SYMBOL(cleancache_register_ops); but I wonder what the intent was here. It looks racy to me, and what prevents the backend module from unloading while it is in use? Neither of the two in-tree callers seems to do anything with the returned structure beyond printing a warning if another backend has already registered itself. Also why return the structure and not a pointer to it? The ops structure pointer passed in should also be const I think.>From the code I assume that it is only valid to load the module for asingle cleancache backend at a time, though nothing appears to enforce that. Also, as regards your earlier question wrt a kvm backend, I may be tempted to have a go at writing one, but I'd like to figure out what I'm letting myself in for before making any commitment to that, Steve.
Dan Magenheimer
2011-Jun-02 18:26 UTC
[Ocfs2-devel] bug in cleancache ocfs2 hook, anybody want to try cleancache?
> Having started looking at the cleancache code in a bit more detail, I > have another question... what is the intended mechanism for selecting a > cleancache backend? The registration code looks like this: > > struct cleancache_ops cleancache_register_ops(struct cleancache_ops > *ops) > { > struct cleancache_ops old = cleancache_ops; > > cleancache_ops = *ops; > cleancache_enabled = 1; > return old; > } > EXPORT_SYMBOL(cleancache_register_ops); > > but I wonder what the intent was here. It looks racy to me, and what > prevents the backend module from unloading while it is in use? Neither > of the two in-tree callers seems to do anything with the returned > structure beyond printing a warning if another backend has already > registered itself. Also why return the structure and not a pointer to > it? The ops structure pointer passed in should also be const I think. > > From the code I assume that it is only valid to load the module for a > single cleancache backend at a time, though nothing appears to enforce > that.Hi Steven -- The intent was to allow backends to be "chained", but this is not used yet and not really very well thought through yet either (e.g. possible coherency issues of chaining). So, yes, currently only one cleancache backend can be loaded at time. There's another initialization issue... if mounts are done before a backend registers, those mounts are not enabled for cleancache. As a result, cleancache backends generally need to be built-in, not loaded separately as a module. I've had ideas on how to fix this for some time (basically recording calls to cleancache_init_fs that occur when no backend is registered, then calling the backend lazily after registration occurs).> Also, as regards your earlier question wrt a kvm backend, I may be > tempted to have a go at writing one, but I'd like to figure out what > I'm > letting myself in for before making any commitment to that,I think the hardest part is updating the tmem.c module in zcache to support multiple "clients". When I ported it from Xen, I tore all that out. Fortunately, I've put it back in during RAMster development but those changes haven't yet seen the light of day (though I can share them offlist). The next issue is the guest->host interface. Is there the equivalent of a hypercall in KVM? If so, a shim like drivers/xen/tmem.c is needed in the guest, and some shim that interfaces the host side of the hypercall to tmem.c (and presumably zcache). That may be enough for a proof-of-concept, though Xen has a bunch of tools and stuff for which KVM would probably want some equivalent. If you are at all interested, let's take the details offlist. It would be great to have a proof-of-concept by KVM Forum! Thanks, Dan
Dan Magenheimer
2011-Jun-03 15:03 UTC
[Ocfs2-devel] bug in cleancache ocfs2 hook, anybody want to try cleancache?
> > There's another initialization issue... if mounts are done > > before a backend registers, those mounts are not enabled > > for cleancache. As a result, cleancache backends generally > > need to be built-in, not loaded separately as a module. > > I've had ideas on how to fix this for some time (basically > > recording calls to cleancache_init_fs that occur when no > > backend is registered, then calling the backend lazily after > > registration occurs). > > > Ok... but if cleancache_init_fs were to take (for example) an argument > specifying the back end to use (I'm thinking here of say a > cleancache=tmem mount argument for filesystems or something similar) > then the backend module could be automatically loaded if required. It > would also allow, by design, multiple backends to be used without > interfering with each other.That's an interesting approach. What use model do you have in mind for this? I can see a disadvantage of having one fs use one cleancache backend while another fs uses another independent cleancache backend: It might be much more difficult to do accounting and things like deduplication across multiple backends. Also, statistically, managing multiple LRU queues (e.g. to ensure ephemeral pages are evicted in LRU order) is less efficient that managing a single one. But I may not understand what you have in mind.> > The intent was to allow backends to be "chained", but this is > > not used yet and not really very well thought through yet either > > (e.g. possible coherency issues of chaining). > > So, yes, currently only one cleancache backend can be loaded > > at time. > > > I don't understand the intent behind chaining of the backends. Did you > mean that pages would migrate from one backend to another down the > stack > as each one discards pages and that pages would migrate back up the > stack again when pulled back in from the filesystem? I'm not sure I can > see any application for such a scheme, unless I'm missing something.Each put can be rejected by a cleancache backend. So I was thinking that chaining could be used, for example, as follows: 1) zcache registers and discovers that another backend (Xen tmem) had previously registered, so saves the ops 2) kernel puts lots of pages to cleancache 3) eventually zcache "fills up" and would normally have to reject the put but... 4) instead zcache attempts to put the page to Xen tmem using the saved ops 5) if Xen tmem accepts the page success is returned, if not zcache returns failure 6) caveat: once zcache has put a page to Xen tmem, zcache needs to always "get" to the chained backend if a local get fails, and must always also flush both places I thought I might use this for RAMster (to put/get to a different physical machine), but instead have hard-coded a modified zcache version.> I'd like to try and understand the design of the existing code before I > consider anything more advanced such as writing a kvm backend,OK. I'd be happy to answer any questions on the design at any time. Thanks, Dan
Dan Magenheimer
2011-Jul-26 14:54 UTC
[Ocfs2-devel] bug in cleancache ocfs2 hook, anybody want to try cleancache?
> > > Also, as regards your earlier question wrt a kvm backend, I may be > > > tempted to have a go at writing one, but I'd like to figure out what > > > I'm letting myself in for before making any commitment to that, > > > I think the hardest part is updating the tmem.c module in zcache > > to support multiple "clients". When I ported it from Xen, I tore > > all that out. Fortunately, I've put it back in during RAMster > > development but those changes haven't yet seen the light of day > > (though I can share them offlist). > > I'd like to try and understand the design of the existing code before I > consider anything more advanced such as writing a kvm backend,FYI, the "hardest part" (support for multiple clients) is now in Linus's tree as of today. With a little hypercall and glue code, KVM might just work now with tmem/cleancache/frontswap. Dan