Hi. I just noticed that we have a bug fix in FreeBSD that I never reported back. The thing is that ZFS creates dnode cache zone: dnode_cache = kmem_cache_create("dnode_t", sizeof (dnode_t), 0, dnode_cons, dnode_dest, NULL, NULL, NULL, 0); And declares dnode_cons() as constructor callback. But in the code we can find that dnode_cons() is called directly for the second time after allocation: dnode_t *dn = kmem_cache_alloc(dnode_cache, KM_SLEEP); (void) dnode_cons(dn, NULL, 0); /* XXX */ This way we initialize everything in dnode_cons() twice, which might not be critial on OpenSolaris, but it is critical on FreeBSD. And it is a waste of time for both systems. This simple patch works on FreeBSD: --- dnode.c.orig 2009-09-05 14:01:09.971645686 +0200 +++ dnode.c 2009-09-05 14:01:28.208648713 +0200 @@ -276,7 +276,6 @@ uint64_t object) { dnode_t *dn = kmem_cache_alloc(dnode_cache, KM_SLEEP); - (void) dnode_cons(dn, NULL, 0); /* XXX */ dn->dn_objset = os; dn->dn_object = object; -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20090905/48510214/attachment.bin>
Hi Pawel, Unfortunately your fix will not work correctly, because the DMU code expects all of the dnode_t fields to be initialized to 0 after allocation, which will not happen if you eliminate that call. In fact we had the same "fix" in the Lustre tree for a while, but it led to assertion failures during testing due to that problem. I am attaching a patch that contains a more thorough fix, however please note that this hasn''t gone through a complete code review, not from the Lustre team nor from the ZFS team. It seems to be working reasonably for us in our very limited testing, but use at your own risk :-) I am planning to do a code review of this patch and integrate it into Nevada some time in the future, along with some other minor bug fixes. Best regards, Ricardo On S?b, 2009-09-05 at 14:16 +0200, Pawel Jakub Dawidek wrote:> Hi. > > I just noticed that we have a bug fix in FreeBSD that I never reported > back. The thing is that ZFS creates dnode cache zone: > > dnode_cache = kmem_cache_create("dnode_t", > sizeof (dnode_t), > 0, dnode_cons, dnode_dest, NULL, NULL, NULL, 0); > > And declares dnode_cons() as constructor callback. But in the code we > can find that dnode_cons() is called directly for the second time after > allocation: > > dnode_t *dn = kmem_cache_alloc(dnode_cache, KM_SLEEP); > (void) dnode_cons(dn, NULL, 0); /* XXX */ > > This way we initialize everything in dnode_cons() twice, which might not > be critial on OpenSolaris, but it is critical on FreeBSD. And it is a > waste of time for both systems. > > This simple patch works on FreeBSD: > > --- dnode.c.orig 2009-09-05 14:01:09.971645686 +0200 > +++ dnode.c 2009-09-05 14:01:28.208648713 +0200 > @@ -276,7 +276,6 @@ > uint64_t object) > { > dnode_t *dn = kmem_cache_alloc(dnode_cache, KM_SLEEP); > - (void) dnode_cons(dn, NULL, 0); /* XXX */ > > dn->dn_objset = os; > dn->dn_object = object; > > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-code-------------- next part -------------- An embedded message was scrubbed... From: Brian Behlendorf <behlendorf1 at llnl.gov> Subject: [PATCH] fix dnode constructor Date: Sat, 05 Sep 2009 14:47:10 +0200 Size: 8335 URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20090905/b896f4cd/attachment.nws> -------------- next part -------------- An embedded message was scrubbed... From: Brian Behlendorf <behlendorf1 at llnl.gov> Subject: [PATCH] fix dnode constructor Date: Sat, 05 Sep 2009 14:55:10 +0200 Size: 8336 URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20090905/b896f4cd/attachment-0001.nws>
On Sat, Sep 05, 2009 at 02:56:21PM +0200, Ricardo M. Correia wrote:> Hi Pawel, > > Unfortunately your fix will not work correctly, because the DMU code > expects all of the dnode_t fields to be initialized to 0 after > allocation, which will not happen if you eliminate that call.And this is not what constructor is doing? Could you explain what doesn''t work here exactly? From my understanding (at least that''s the case for FreeBSD''s zone allocator) kmem_cache will call constructor after allocating the memory. If it isn''t called after allocation then when?> In fact we had the same "fix" in the Lustre tree for a while, but it led > to assertion failures during testing due to that problem. > > I am attaching a patch that contains a more thorough fix, however please > note that this hasn''t gone through a complete code review, not from the > Lustre team nor from the ZFS team. It seems to be working reasonably for > us in our very limited testing, but use at your own risk :-) > > I am planning to do a code review of this patch and integrate it into > Nevada some time in the future, along with some other minor bug fixes.-- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20090905/923e24f7/attachment.bin>
On Sat, 2009-09-05 at 15:08 +0200, Pawel Jakub Dawidek wrote:> > Unfortunately your fix will not work correctly, because the DMU code > > expects all of the dnode_t fields to be initialized to 0 after > > allocation, which will not happen if you eliminate that call. > > And this is not what constructor is doing? Could you explain what > doesn''t work here exactly? From my understanding (at least that''s the > case for FreeBSD''s zone allocator) kmem_cache will call constructor > after allocating the memory. If it isn''t called after allocation then > when?Well, in Linux and Solaris, the constructor is called only when an entire slab is allocated internally by the kmem code, not when the one of the objects in the slab is allocated (by calling kmem_cache_alloc()). In fact, the only thing kmem_cache_alloc() usually does is to just retrieve a pointer from a per-CPU array of pointers and decrement a counter (except if there are no more pointers available, in which case the cache is refilled). But if in FreeBSD kmem_cache_alloc() calls the constructor, then I guess you can live with your simple fix instead, no need for the complicated fix. Best regards, Ricardo
On Sat, Sep 05, 2009 at 03:22:12PM +0200, Ricardo M. Correia wrote:> On Sat, 2009-09-05 at 15:08 +0200, Pawel Jakub Dawidek wrote: > > > Unfortunately your fix will not work correctly, because the DMU code > > > expects all of the dnode_t fields to be initialized to 0 after > > > allocation, which will not happen if you eliminate that call. > > > > And this is not what constructor is doing? Could you explain what > > doesn''t work here exactly? From my understanding (at least that''s the > > case for FreeBSD''s zone allocator) kmem_cache will call constructor > > after allocating the memory. If it isn''t called after allocation then > > when? > > Well, in Linux and Solaris, the constructor is called only when an > entire slab is allocated internally by the kmem code, not when the one > of the objects in the slab is allocated (by calling kmem_cache_alloc()). > > In fact, the only thing kmem_cache_alloc() usually does is to just > retrieve a pointer from a per-CPU array of pointers and decrement a > counter (except if there are no more pointers available, in which case > the cache is refilled). > > But if in FreeBSD kmem_cache_alloc() calls the constructor, then I guess > you can live with your simple fix instead, no need for the complicated > fix.I see, thanks for explanation. Yes, FreeBSD always calls the constructor, even if the memory is taken from cache. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20090905/c88f5f63/attachment.bin>
Hi, On Sep,Saturday 5 2009, at 3:33 PM, Pawel Jakub Dawidek wrote:> On Sat, Sep 05, 2009 at 03:22:12PM +0200, Ricardo M. Correia wrote: >> On Sat, 2009-09-05 at 15:08 +0200, Pawel Jakub Dawidek wrote: >>>> Unfortunately your fix will not work correctly, because the DMU >>>> code >>>> expects all of the dnode_t fields to be initialized to 0 after >>>> allocation, which will not happen if you eliminate that call. >>> >>> And this is not what constructor is doing? Could you explain what >>> doesn''t work here exactly? From my understanding (at least that''s >>> the >>> case for FreeBSD''s zone allocator) kmem_cache will call constructor >>> after allocating the memory. If it isn''t called after allocation >>> then >>> when? >> >> Well, in Linux and Solaris, the constructor is called only when an >> entire slab is allocated internally by the kmem code, not when the >> one >> of the objects in the slab is allocated (by calling kmem_cache_alloc >> ()). >> >> In fact, the only thing kmem_cache_alloc() usually does is to just >> retrieve a pointer from a per-CPU array of pointers and decrement a >> counter (except if there are no more pointers available, in which >> case >> the cache is refilled). >> >> But if in FreeBSD kmem_cache_alloc() calls the constructor, then I >> guess >> you can live with your simple fix instead, no need for the >> complicated >> fix. > > I see, thanks for explanation. Yes, FreeBSD always calls the > constructor, > even if the memory is taken from cache.Thanks for finding this, this is problem for NetBSD too then. Regards Adam.