Eric Sandeen
2013-Feb-05 22:00 UTC
[PATCH] btrfs: add delayed_iput list head to btrfs inode
Following the lead from Jeff Mahoney''s comment in the code: /* JDM: If this is fs-wide, why can''t we add a pointer to * btrfs_inode instead and avoid the allocation? */ Remove the NOFAIL kmalloc in btrfs_add_delayed_iput(), and just use a list head in the btrfs inode. This does grow the btrfs inode by 16 bytes, but doesn''t change slab cache utilization on my machine. Rearranging the btrfs inode could get back 8 bytes or so if people are worried about it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Jeff Mahoney <jeffm@suse.com> --- diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 2a8c242..3024006 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -86,6 +86,8 @@ struct btrfs_inode { */ struct list_head ordered_operations; + struct list_head delayed_iput; + /* node for the red-black tree that links inodes in subvolume root */ struct rb_node rb_node; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index cc93b23..cac7f43 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2119,34 +2119,24 @@ zeroit: return -EIO; } -struct delayed_iput { - struct list_head list; - struct inode *inode; -}; - -/* JDM: If this is fs-wide, why can''t we add a pointer to - * btrfs_inode instead and avoid the allocation? */ void btrfs_add_delayed_iput(struct inode *inode) { - struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; - struct delayed_iput *delayed; + struct btrfs_inode *b_inode = BTRFS_I(inode); + struct btrfs_fs_info *fs_info = b_inode->root->fs_info; if (atomic_add_unless(&inode->i_count, -1, 1)) return; - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL); - delayed->inode = inode; - spin_lock(&fs_info->delayed_iput_lock); - list_add_tail(&delayed->list, &fs_info->delayed_iputs); + list_add_tail(&b_inode->delayed_iput, &fs_info->delayed_iputs); spin_unlock(&fs_info->delayed_iput_lock); } void btrfs_run_delayed_iputs(struct btrfs_root *root) { LIST_HEAD(list); + struct btrfs_inode *b_inode; struct btrfs_fs_info *fs_info = root->fs_info; - struct delayed_iput *delayed; int empty; spin_lock(&fs_info->delayed_iput_lock); @@ -2160,10 +2150,9 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root) spin_unlock(&fs_info->delayed_iput_lock); while (!list_empty(&list)) { - delayed = list_entry(list.next, struct delayed_iput, list); - list_del(&delayed->list); - iput(delayed->inode); - kfree(delayed); + b_inode = list_entry(list.next, struct btrfs_inode, delayed_iput); + list_del(&b_inode->delayed_iput); + iput(&b_inode->vfs_inode); } } @@ -7142,6 +7131,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) btrfs_ordered_inode_tree_init(&ei->ordered_tree); INIT_LIST_HEAD(&ei->delalloc_inodes); INIT_LIST_HEAD(&ei->ordered_operations); + INIT_LIST_HEAD(&ei->delayed_iput); RB_CLEAR_NODE(&ei->rb_node); return inode; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Zach Brown
2013-Feb-05 23:14 UTC
Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
> + struct btrfs_inode *b_inode = BTRFS_I(inode); > + struct btrfs_fs_info *fs_info = b_inode->root->fs_info; > > if (atomic_add_unless(&inode->i_count, -1, 1)) > return; > > - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL); > - delayed->inode = inode; > - > spin_lock(&fs_info->delayed_iput_lock); > - list_add_tail(&delayed->list, &fs_info->delayed_iputs); > + list_add_tail(&b_inode->delayed_iput, &fs_info->delayed_iputs); > spin_unlock(&fs_info->delayed_iput_lock); > }Hmm. I''m not great with inode life cycles, but isn''t this only safe if someone else can''t get an i_count reference while this is in flight? It looks like the final iput does the unhashing, and so on, so couldn''t an iget/iput race with this and try to add the inode''s list_head twice? - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote:> > + struct btrfs_inode *b_inode = BTRFS_I(inode); > > + struct btrfs_fs_info *fs_info = b_inode->root->fs_info; > > > > if (atomic_add_unless(&inode->i_count, -1, 1)) > > return; > > > > - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL); > > - delayed->inode = inode; > > - > > spin_lock(&fs_info->delayed_iput_lock); > > - list_add_tail(&delayed->list, &fs_info->delayed_iputs); > > + list_add_tail(&b_inode->delayed_iput, &fs_info->delayed_iputs); > > spin_unlock(&fs_info->delayed_iput_lock); > > } > > Hmm. I''m not great with inode life cycles, but isn''t this only safe if > someone else can''t get an i_count reference while this is in flight? It > looks like the final iput does the unhashing, and so on, so couldn''t an > iget/iput race with this and try to add the inode''s list_head twice?Yeah, same concern here. Basically this will result in inodes still being in use on unmount. Actually I did a similar one, here is some disscussion: https://patchwork.kernel.org/patch/1824711/ thanks, liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Sandeen
2013-Feb-06 14:14 UTC
Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
On Feb 5, 2013, at 8:11 PM, Liu Bo <bo.li.liu@oracle.com> wrote:> On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote: >>> + struct btrfs_inode *b_inode = BTRFS_I(inode); >>> + struct btrfs_fs_info *fs_info = b_inode->root->fs_info; >>> >>> if (atomic_add_unless(&inode->i_count, -1, 1)) >>> return; >>> >>> - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL); >>> - delayed->inode = inode; >>> - >>> spin_lock(&fs_info->delayed_iput_lock); >>> - list_add_tail(&delayed->list, &fs_info->delayed_iputs); >>> + list_add_tail(&b_inode->delayed_iput, &fs_info->delayed_iputs); >>> spin_unlock(&fs_info->delayed_iput_lock); >>> } >> >> Hmm. I''m not great with inode life cycles, but isn''t this only safe if >> someone else can''t get an i_count reference while this is in flight? It >> looks like the final iput does the unhashing, and so on, so couldn''t an >> iget/iput race with this and try to add the inode''s list_head twice? > > Yeah, same concern here. Basically this will result in inodes still being > in use on unmount. > > Actually I did a similar one, here is some disscussion: > > https://patchwork.kernel.org/patch/1824711/ >Ok, thanks all. We should remove Jeff''s comment then, it sure sounded like a good idea... Eric> thanks, > liubo > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Sandeen
2013-Feb-06 15:53 UTC
Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
On 2/5/13 8:08 PM, Liu Bo wrote:> On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote: >>> + struct btrfs_inode *b_inode = BTRFS_I(inode); >>> + struct btrfs_fs_info *fs_info = b_inode->root->fs_info; >>> >>> if (atomic_add_unless(&inode->i_count, -1, 1)) >>> return; >>> >>> - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL); >>> - delayed->inode = inode; >>> - >>> spin_lock(&fs_info->delayed_iput_lock); >>> - list_add_tail(&delayed->list, &fs_info->delayed_iputs); >>> + list_add_tail(&b_inode->delayed_iput, &fs_info->delayed_iputs); >>> spin_unlock(&fs_info->delayed_iput_lock); >>> } >> >> Hmm. I''m not great with inode life cycles, but isn''t this only safe if >> someone else can''t get an i_count reference while this is in flight? It >> looks like the final iput does the unhashing, and so on, so couldn''t an >> iget/iput race with this and try to add the inode''s list_head twice? > > Yeah, same concern here. Basically this will result in inodes still being > in use on unmount. > > Actually I did a similar one, here is some disscussion: > > https://patchwork.kernel.org/patch/1824711/I read it, thanks. Did you try the counter approach? -Eric> thanks, > liubo > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 06, 2013 at 09:53:05AM -0600, Eric Sandeen wrote:> On 2/5/13 8:08 PM, Liu Bo wrote: > > On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote: > >>> + struct btrfs_inode *b_inode = BTRFS_I(inode); > >>> + struct btrfs_fs_info *fs_info = b_inode->root->fs_info; > >>> > >>> if (atomic_add_unless(&inode->i_count, -1, 1)) > >>> return; > >>> > >>> - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL); > >>> - delayed->inode = inode; > >>> - > >>> spin_lock(&fs_info->delayed_iput_lock); > >>> - list_add_tail(&delayed->list, &fs_info->delayed_iputs); > >>> + list_add_tail(&b_inode->delayed_iput, &fs_info->delayed_iputs); > >>> spin_unlock(&fs_info->delayed_iput_lock); > >>> } > >> > >> Hmm. I''m not great with inode life cycles, but isn''t this only safe if > >> someone else can''t get an i_count reference while this is in flight? It > >> looks like the final iput does the unhashing, and so on, so couldn''t an > >> iget/iput race with this and try to add the inode''s list_head twice? > > > > Yeah, same concern here. Basically this will result in inodes still being > > in use on unmount. > > > > Actually I did a similar one, here is some disscussion: > > > > https://patchwork.kernel.org/patch/1824711/ > > I read it, thanks. Did you try the counter approach?Yes, it''ll bring a tradeoff situation. With counter, we need to lock the list all the time instead of doing a splice on the list and unlocking it. I think splice would be faster so I didn''t go further(I MIGHT be wrong on this).. thanks, liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jeff Mahoney
2013-Feb-12 07:34 UTC
Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2/6/13 11:02 AM, Liu Bo wrote:> On Wed, Feb 06, 2013 at 09:53:05AM -0600, Eric Sandeen wrote: >> On 2/5/13 8:08 PM, Liu Bo wrote: >>> On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote: >>>>> + struct btrfs_inode *b_inode = BTRFS_I(inode); + struct >>>>> btrfs_fs_info *fs_info = b_inode->root->fs_info; >>>>> >>>>> if (atomic_add_unless(&inode->i_count, -1, 1)) return; >>>>> >>>>> - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | >>>>> __GFP_NOFAIL); - delayed->inode = inode; - >>>>> spin_lock(&fs_info->delayed_iput_lock); - >>>>> list_add_tail(&delayed->list, &fs_info->delayed_iputs); + >>>>> list_add_tail(&b_inode->delayed_iput, >>>>> &fs_info->delayed_iputs); >>>>> spin_unlock(&fs_info->delayed_iput_lock); } >>>> >>>> Hmm. I''m not great with inode life cycles, but isn''t this >>>> only safe if someone else can''t get an i_count reference >>>> while this is in flight? It looks like the final iput does >>>> the unhashing, and so on, so couldn''t an iget/iput race with >>>> this and try to add the inode''s list_head twice? >>> >>> Yeah, same concern here. Basically this will result in inodes >>> still being in use on unmount. >>> >>> Actually I did a similar one, here is some disscussion: >>> >>> https://patchwork.kernel.org/patch/1824711/ >> >> I read it, thanks. Did you try the counter approach? > > Yes, it''ll bring a tradeoff situation. > > With counter, we need to lock the list all the time instead of > doing a splice on the list and unlocking it. I think splice would > be faster so I didn''t go further(I MIGHT be wrong on this)..Thanks for looking into this. I left this note to myself during the development of the error handling patches while on a tangent to try to eliminate NOFAIL allocs. It''s not the alloc/free that''s the issue (though eliminating these can probably only help), it''s that NOFAIL allocs essentially become locks when memory pressure is high enough that the NOFAIL functionality gets invoked. OTOH, bailing out of that path when we encounter an allocation failure is impossible. - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.18 (Darwin) Comment: GPGTools - http://gpgtools.org iQIcBAEBAgAGBQJRGfB3AAoJEB57S2MheeWy/E4QALVJ2YI1zbwCHnkUia+yuT40 LoYfyRJoTiKwnwiFeByy98tX9WxVnXGZUVpR8GMwVuLfDIMyVgQmaAicqiirHHHD ySNV3jsyz8HCOb6ALu7eQyWy4F8yBD1HG75njvvzVO+zUlSsaKGmfvsXS0f4ubCk hyxg7OujW++cWg+WOedCZsg2n7kF34MLPJiyjS1E1vw8DZW3tHKWgv/hyJIzp+JK wIZQPrzNUTp0kS4N6+b8rJnXTNkj7zMhWPYeJdIMIG9/+oDr2r1N/XedYMY7fkdS g7Gj28nmTtufYlTcgztL6MHFwxm/tRQNl85+lRU/zYFKIR0ok4+1kFrpZ5KcF97m NZeGSsSiaZfMXE+t6B/AgagFJUws+y/RHBJ/V9paMNjsojLRUBVPQOdeHw355XVm lJeTtyElA+SSawPkzf2115IEj1EgFmHIouSQJdUCPoTfS126NHhH0PYX2GHgAs8b 1ImyG9E/Z/JswVRzAxWGQSffdxzg5Vb8P8w7LzAlIdToVa0tM3Q2n9h3a0vcl83m NQEqe3+GnsflB2xSVyoztVx+ZL8664HC1UzIjgb7oUihGHe7gJZ4uqDgaClGprKh pQyvr8zsbjeMwpvlqv7gRQDFyY3JKK4W5UeS/pGjTM7ORS1LmEUTR5S4pQknTUgc Qj/bH6806My5pW3VB5i5 =ZSdX -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html