Filipe David Borba Manana
2013-Dec-14 14:43 UTC
[PATCH] Btrfs: wait for ordered extents before removing extent maps
Wang Shilong got into a case where during inode eviction we were removing an extent map while it was pinned. This triggered a warning in remove_extent_mapping() because the extent map had the pinned flag set: [ 1209.102076] [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs] [ 1209.102084] [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs] [ 1209.102089] [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40 [ 1209.102092] [<ffffffff8118ab2e>] evict+0x9e/0x190 [ 1209.102094] [<ffffffff8118b313>] iput+0xf3/0x180 [ 1209.102101] [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs] [ 1209.102107] [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs] Therefore wait for any pending ordered extents, if any, which will trigger calls to unpin_extent_cache(), before removing the extent maps. Wang''s solution of simply clearing the pinned bit wasn''t enough, as after unpin_extent_cache() will be called and trigger another WARN_ON() because the lookup for the extent map returned NULL. Thanks Wang for finding out this. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> --- fs/btrfs/inode.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e889779..c2933fb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode) ASSERT(inode->i_state & I_FREEING); truncate_inode_pages(&inode->i_data, 0); + /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ + btrfs_wait_ordered_range(inode, 0, (u64)-1); + write_lock(&map_tree->lock); while (!RB_EMPTY_ROOT(&map_tree->map)) { struct extent_map *em; @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode) btrfs_orphan_del(NULL, inode); goto no_delete; } - /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ - btrfs_wait_ordered_range(inode, 0, (u64)-1); if (root->fs_info->log_root_recovering) { BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Shilong Wang
2013-Dec-14 14:56 UTC
Re: [PATCH] Btrfs: wait for ordered extents before removing extent maps
Hello Filipe, 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>:> Wang Shilong got into a case where during inode eviction we were > removing an extent map while it was pinned. This triggered a warning > in remove_extent_mapping() because the extent map had the pinned > flag set: > > [ 1209.102076] [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs] > [ 1209.102084] [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs] > [ 1209.102089] [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40 > [ 1209.102092] [<ffffffff8118ab2e>] evict+0x9e/0x190 > [ 1209.102094] [<ffffffff8118b313>] iput+0xf3/0x180 > [ 1209.102101] [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs] > [ 1209.102107] [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs] > > Therefore wait for any pending ordered extents, if any, which will > trigger calls to unpin_extent_cache(), before removing the extent maps. > > Wang''s solution of simply clearing the pinned bit wasn''t enough, as after > unpin_extent_cache() will be called and trigger another WARN_ON() because > the lookup for the extent map returned NULL.Why not in evict_inode_truncate_pages() move remove_extent_mapping() after clear_extent_bit()? Thanks, Wang> > Thanks Wang for finding out this. > > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> > --- > fs/btrfs/inode.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index e889779..c2933fb 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode) > ASSERT(inode->i_state & I_FREEING); > truncate_inode_pages(&inode->i_data, 0); > > + /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ > + btrfs_wait_ordered_range(inode, 0, (u64)-1); > + > write_lock(&map_tree->lock); > while (!RB_EMPTY_ROOT(&map_tree->map)) { > struct extent_map *em; > @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode) > btrfs_orphan_del(NULL, inode); > goto no_delete; > } > - /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ > - btrfs_wait_ordered_range(inode, 0, (u64)-1); > > if (root->fs_info->log_root_recovering) { > BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, > -- > 1.7.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filipe David Manana
2013-Dec-14 15:01 UTC
Re: [PATCH] Btrfs: wait for ordered extents before removing extent maps
On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <wangshilong1991@gmail.com> wrote:> Hello Filipe, > > 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>: >> Wang Shilong got into a case where during inode eviction we were >> removing an extent map while it was pinned. This triggered a warning >> in remove_extent_mapping() because the extent map had the pinned >> flag set: >> >> [ 1209.102076] [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs] >> [ 1209.102084] [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs] >> [ 1209.102089] [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40 >> [ 1209.102092] [<ffffffff8118ab2e>] evict+0x9e/0x190 >> [ 1209.102094] [<ffffffff8118b313>] iput+0xf3/0x180 >> [ 1209.102101] [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs] >> [ 1209.102107] [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs] >> >> Therefore wait for any pending ordered extents, if any, which will >> trigger calls to unpin_extent_cache(), before removing the extent maps. >> >> Wang''s solution of simply clearing the pinned bit wasn''t enough, as after >> unpin_extent_cache() will be called and trigger another WARN_ON() because >> the lookup for the extent map returned NULL. > > Why not in evict_inode_truncate_pages() move remove_extent_mapping() after > clear_extent_bit()?So, if the pinned bit is set, it means some task will clear it later, via unpin_extent_cache(). And if you look at that function, it has this: write_lock(&tree->lock); em = lookup_extent_mapping(tree, start, len); WARN_ON(!em || em->start != start); And remove_extent_mapping() will remove the em from the rbtree, regardless of its reference count value, therefore triggering that warning above. Does it makes sense? thanks> > Thanks, > Wang >> >> Thanks Wang for finding out this. >> >> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> >> --- >> fs/btrfs/inode.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index e889779..c2933fb 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode) >> ASSERT(inode->i_state & I_FREEING); >> truncate_inode_pages(&inode->i_data, 0); >> >> + /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ >> + btrfs_wait_ordered_range(inode, 0, (u64)-1); >> + >> write_lock(&map_tree->lock); >> while (!RB_EMPTY_ROOT(&map_tree->map)) { >> struct extent_map *em; >> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode) >> btrfs_orphan_del(NULL, inode); >> goto no_delete; >> } >> - /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ >> - btrfs_wait_ordered_range(inode, 0, (u64)-1); >> >> if (root->fs_info->log_root_recovering) { >> BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, >> -- >> 1.7.9.5 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html-- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That''s why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Shilong Wang
2013-Dec-14 15:08 UTC
Re: [PATCH] Btrfs: wait for ordered extents before removing extent maps
2013/12/14 Filipe David Manana <fdmanana@gmail.com>:> On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <wangshilong1991@gmail.com> wrote: >> Hello Filipe, >> >> 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>: >>> Wang Shilong got into a case where during inode eviction we were >>> removing an extent map while it was pinned. This triggered a warning >>> in remove_extent_mapping() because the extent map had the pinned >>> flag set: >>> >>> [ 1209.102076] [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs] >>> [ 1209.102084] [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs] >>> [ 1209.102089] [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40 >>> [ 1209.102092] [<ffffffff8118ab2e>] evict+0x9e/0x190 >>> [ 1209.102094] [<ffffffff8118b313>] iput+0xf3/0x180 >>> [ 1209.102101] [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs] >>> [ 1209.102107] [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs] >>> >>> Therefore wait for any pending ordered extents, if any, which will >>> trigger calls to unpin_extent_cache(), before removing the extent maps. >>> >>> Wang''s solution of simply clearing the pinned bit wasn''t enough, as after >>> unpin_extent_cache() will be called and trigger another WARN_ON() because >>> the lookup for the extent map returned NULL. >> >> Why not in evict_inode_truncate_pages() move remove_extent_mapping() after >> clear_extent_bit()? > > So, if the pinned bit is set, it means some task will clear it later, > via unpin_extent_cache(). And if you look at that function, it has > this: > > write_lock(&tree->lock); > em = lookup_extent_mapping(tree, start, len); > > WARN_ON(!em || em->start != start); > > And remove_extent_mapping() will remove the em from the rbtree, > regardless of its reference count value, therefore triggering that > warning above.Here i mean, in evict_inode_truncate_pages() We change it to: Step1: unpin_extent_cache() Step2: remove it from extent_mapping Dose this cause any problems? i am a little confused, correct me if i am wrong some places^_^.> > Does it makes sense? > > thanks > >> >> Thanks, >> Wang >>> >>> Thanks Wang for finding out this. >>> >>> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> >>> --- >>> fs/btrfs/inode.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >>> index e889779..c2933fb 100644 >>> --- a/fs/btrfs/inode.c >>> +++ b/fs/btrfs/inode.c >>> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode) >>> ASSERT(inode->i_state & I_FREEING); >>> truncate_inode_pages(&inode->i_data, 0); >>> >>> + /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ >>> + btrfs_wait_ordered_range(inode, 0, (u64)-1); >>> + >>> write_lock(&map_tree->lock); >>> while (!RB_EMPTY_ROOT(&map_tree->map)) { >>> struct extent_map *em; >>> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode) >>> btrfs_orphan_del(NULL, inode); >>> goto no_delete; >>> } >>> - /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ >>> - btrfs_wait_ordered_range(inode, 0, (u64)-1); >>> >>> if (root->fs_info->log_root_recovering) { >>> BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, >>> -- >>> 1.7.9.5 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That''s why all progress depends on unreasonable men."-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filipe David Manana
2013-Dec-14 15:13 UTC
Re: [PATCH] Btrfs: wait for ordered extents before removing extent maps
On Sat, Dec 14, 2013 at 3:08 PM, Shilong Wang <wangshilong1991@gmail.com> wrote:> 2013/12/14 Filipe David Manana <fdmanana@gmail.com>: >> On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <wangshilong1991@gmail.com> wrote: >>> Hello Filipe, >>> >>> 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>: >>>> Wang Shilong got into a case where during inode eviction we were >>>> removing an extent map while it was pinned. This triggered a warning >>>> in remove_extent_mapping() because the extent map had the pinned >>>> flag set: >>>> >>>> [ 1209.102076] [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs] >>>> [ 1209.102084] [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs] >>>> [ 1209.102089] [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40 >>>> [ 1209.102092] [<ffffffff8118ab2e>] evict+0x9e/0x190 >>>> [ 1209.102094] [<ffffffff8118b313>] iput+0xf3/0x180 >>>> [ 1209.102101] [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs] >>>> [ 1209.102107] [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs] >>>> >>>> Therefore wait for any pending ordered extents, if any, which will >>>> trigger calls to unpin_extent_cache(), before removing the extent maps. >>>> >>>> Wang''s solution of simply clearing the pinned bit wasn''t enough, as after >>>> unpin_extent_cache() will be called and trigger another WARN_ON() because >>>> the lookup for the extent map returned NULL. >>> >>> Why not in evict_inode_truncate_pages() move remove_extent_mapping() after >>> clear_extent_bit()? >> >> So, if the pinned bit is set, it means some task will clear it later, >> via unpin_extent_cache(). And if you look at that function, it has >> this: >> >> write_lock(&tree->lock); >> em = lookup_extent_mapping(tree, start, len); >> >> WARN_ON(!em || em->start != start); >> >> And remove_extent_mapping() will remove the em from the rbtree, >> regardless of its reference count value, therefore triggering that >> warning above. > > Here i mean, in evict_inode_truncate_pages() > We change it to: > > Step1: unpin_extent_cache() > Step2: remove it from extent_mapping > > Dose this cause any problems? i am a little confused, correct me if i > am wrong some places^_^.It can still lead to the same WARN_ON I think. So when calling unpin_extent_cache(), it can merge the em with its left neighbor, therefore changing its ->start value. So later, if other task (the one which set the pinned flag) calls remove_extent_mapping(), it will get an em with a different ->start (because of the merge), therefore triggering that WARN_ON(). What do you think? thanks> > >> >> Does it makes sense? >> >> thanks >> >>> >>> Thanks, >>> Wang >>>> >>>> Thanks Wang for finding out this. >>>> >>>> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> >>>> --- >>>> fs/btrfs/inode.c | 5 +++-- >>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >>>> index e889779..c2933fb 100644 >>>> --- a/fs/btrfs/inode.c >>>> +++ b/fs/btrfs/inode.c >>>> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode) >>>> ASSERT(inode->i_state & I_FREEING); >>>> truncate_inode_pages(&inode->i_data, 0); >>>> >>>> + /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ >>>> + btrfs_wait_ordered_range(inode, 0, (u64)-1); >>>> + >>>> write_lock(&map_tree->lock); >>>> while (!RB_EMPTY_ROOT(&map_tree->map)) { >>>> struct extent_map *em; >>>> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode) >>>> btrfs_orphan_del(NULL, inode); >>>> goto no_delete; >>>> } >>>> - /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ >>>> - btrfs_wait_ordered_range(inode, 0, (u64)-1); >>>> >>>> if (root->fs_info->log_root_recovering) { >>>> BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, >>>> -- >>>> 1.7.9.5 >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> Filipe David Manana, >> >> "Reasonable men adapt themselves to the world. >> Unreasonable men adapt the world to themselves. >> That''s why all progress depends on unreasonable men."-- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That''s why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filipe David Manana
2013-Dec-14 15:51 UTC
Re: [PATCH] Btrfs: wait for ordered extents before removing extent maps
On Sat, Dec 14, 2013 at 3:13 PM, Filipe David Manana <fdmanana@gmail.com> wrote:> On Sat, Dec 14, 2013 at 3:08 PM, Shilong Wang <wangshilong1991@gmail.com> wrote: >> 2013/12/14 Filipe David Manana <fdmanana@gmail.com>: >>> On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <wangshilong1991@gmail.com> wrote: >>>> Hello Filipe, >>>> >>>> 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>: >>>>> Wang Shilong got into a case where during inode eviction we were >>>>> removing an extent map while it was pinned. This triggered a warning >>>>> in remove_extent_mapping() because the extent map had the pinned >>>>> flag set: >>>>> >>>>> [ 1209.102076] [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs] >>>>> [ 1209.102084] [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs] >>>>> [ 1209.102089] [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40 >>>>> [ 1209.102092] [<ffffffff8118ab2e>] evict+0x9e/0x190 >>>>> [ 1209.102094] [<ffffffff8118b313>] iput+0xf3/0x180 >>>>> [ 1209.102101] [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs] >>>>> [ 1209.102107] [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs] >>>>> >>>>> Therefore wait for any pending ordered extents, if any, which will >>>>> trigger calls to unpin_extent_cache(), before removing the extent maps. >>>>> >>>>> Wang''s solution of simply clearing the pinned bit wasn''t enough, as after >>>>> unpin_extent_cache() will be called and trigger another WARN_ON() because >>>>> the lookup for the extent map returned NULL. >>>> >>>> Why not in evict_inode_truncate_pages() move remove_extent_mapping() after >>>> clear_extent_bit()? >>> >>> So, if the pinned bit is set, it means some task will clear it later, >>> via unpin_extent_cache(). And if you look at that function, it has >>> this: >>> >>> write_lock(&tree->lock); >>> em = lookup_extent_mapping(tree, start, len); >>> >>> WARN_ON(!em || em->start != start); >>> >>> And remove_extent_mapping() will remove the em from the rbtree, >>> regardless of its reference count value, therefore triggering that >>> warning above. >> >> Here i mean, in evict_inode_truncate_pages() >> We change it to: >> >> Step1: unpin_extent_cache() >> Step2: remove it from extent_mapping >> >> Dose this cause any problems? i am a little confused, correct me if i >> am wrong some places^_^. > > It can still lead to the same WARN_ON I think. So when calling > unpin_extent_cache(), it can merge the em with its left neighbor, > therefore changing its ->start value. So later, if other task (the one > which set the pinned flag) calls remove_extent_mapping(), it will get > an em with a different ->start (because of the merge), therefore > triggering that WARN_ON().Or because it is not found the second time. On the other hand, you didn''t get such WARN_ON triggered, right? So maybe just clearing the pinned bit is ok. So btrfs_invalidatepage, if it finds an ordered extent, it sets the BTRFS_ORDERED_TRUNCATED flag on it, and then it might call btrfs_finish_ordered_io() against it, which not always unpins the extent when it has the truncated flag set. So this might well be what you ran into. I''m ok with your approach too. thanks> > What do you think? > > thanks > >> >> >>> >>> Does it makes sense? >>> >>> thanks >>> >>>> >>>> Thanks, >>>> Wang >>>>> >>>>> Thanks Wang for finding out this. >>>>> >>>>> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> >>>>> --- >>>>> fs/btrfs/inode.c | 5 +++-- >>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >>>>> index e889779..c2933fb 100644 >>>>> --- a/fs/btrfs/inode.c >>>>> +++ b/fs/btrfs/inode.c >>>>> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode) >>>>> ASSERT(inode->i_state & I_FREEING); >>>>> truncate_inode_pages(&inode->i_data, 0); >>>>> >>>>> + /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ >>>>> + btrfs_wait_ordered_range(inode, 0, (u64)-1); >>>>> + >>>>> write_lock(&map_tree->lock); >>>>> while (!RB_EMPTY_ROOT(&map_tree->map)) { >>>>> struct extent_map *em; >>>>> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode) >>>>> btrfs_orphan_del(NULL, inode); >>>>> goto no_delete; >>>>> } >>>>> - /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ >>>>> - btrfs_wait_ordered_range(inode, 0, (u64)-1); >>>>> >>>>> if (root->fs_info->log_root_recovering) { >>>>> BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, >>>>> -- >>>>> 1.7.9.5 >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> >>> -- >>> Filipe David Manana, >>> >>> "Reasonable men adapt themselves to the world. >>> Unreasonable men adapt the world to themselves. >>> That''s why all progress depends on unreasonable men." > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That''s why all progress depends on unreasonable men."-- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That''s why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Shilong Wang
2013-Dec-15 03:34 UTC
Re: [PATCH] Btrfs: wait for ordered extents before removing extent maps
2013/12/14 Filipe David Manana <fdmanana@gmail.com>:> On Sat, Dec 14, 2013 at 3:13 PM, Filipe David Manana <fdmanana@gmail.com> wrote: >> On Sat, Dec 14, 2013 at 3:08 PM, Shilong Wang <wangshilong1991@gmail.com> wrote: >>> 2013/12/14 Filipe David Manana <fdmanana@gmail.com>: >>>> On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <wangshilong1991@gmail.com> wrote: >>>>> Hello Filipe, >>>>> >>>>> 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>: >>>>>> Wang Shilong got into a case where during inode eviction we were >>>>>> removing an extent map while it was pinned. This triggered a warning >>>>>> in remove_extent_mapping() because the extent map had the pinned >>>>>> flag set: >>>>>> >>>>>> [ 1209.102076] [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs] >>>>>> [ 1209.102084] [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs] >>>>>> [ 1209.102089] [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40 >>>>>> [ 1209.102092] [<ffffffff8118ab2e>] evict+0x9e/0x190 >>>>>> [ 1209.102094] [<ffffffff8118b313>] iput+0xf3/0x180 >>>>>> [ 1209.102101] [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs] >>>>>> [ 1209.102107] [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs] >>>>>> >>>>>> Therefore wait for any pending ordered extents, if any, which will >>>>>> trigger calls to unpin_extent_cache(), before removing the extent maps. >>>>>> >>>>>> Wang''s solution of simply clearing the pinned bit wasn''t enough, as after >>>>>> unpin_extent_cache() will be called and trigger another WARN_ON() because >>>>>> the lookup for the extent map returned NULL. >>>>> >>>>> Why not in evict_inode_truncate_pages() move remove_extent_mapping() after >>>>> clear_extent_bit()? >>>> >>>> So, if the pinned bit is set, it means some task will clear it later, >>>> via unpin_extent_cache(). And if you look at that function, it has >>>> this: >>>> >>>> write_lock(&tree->lock); >>>> em = lookup_extent_mapping(tree, start, len); >>>> >>>> WARN_ON(!em || em->start != start); >>>> >>>> And remove_extent_mapping() will remove the em from the rbtree, >>>> regardless of its reference count value, therefore triggering that >>>> warning above. >>> >>> Here i mean, in evict_inode_truncate_pages() >>> We change it to: >>> >>> Step1: unpin_extent_cache() >>> Step2: remove it from extent_mapping >>> >>> Dose this cause any problems? i am a little confused, correct me if i >>> am wrong some places^_^. >> >> It can still lead to the same WARN_ON I think. So when calling >> unpin_extent_cache(), it can merge the em with its left neighbor, >> therefore changing its ->start value. So later, if other task (the one >> which set the pinned flag) calls remove_extent_mapping(), it will get >> an em with a different ->start (because of the merge), therefore >> triggering that WARN_ON(). > > Or because it is not found the second time. > > On the other hand, you didn''t get such WARN_ON triggered, right?During my test, i did not hit another WARN_ON in fact.> > So maybe just clearing the pinned bit is ok. So btrfs_invalidatepage, > if it finds an ordered extent, it sets the BTRFS_ORDERED_TRUNCATED > flag on it, and then it might call btrfs_finish_ordered_io() against > it, which not always unpins the extent when it has the truncated flag > set. So this might well be what you ran into.Let''s dig more. I take a deep look in btrfs_finish_ordered_io(), it won''t unset PINNED flag For an NOCOW extent. 2579 if (test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags)) { 2580 BUG_ON(!list_empty(&ordered_extent->list)); /* Logic error */ 2581 btrfs_ordered_update_i_size(inode, 0, ordered_extent); 2582 if (nolock) 2583 trans = btrfs_join_transaction_nolock(root); 2584 else 2585 trans = btrfs_join_transaction(root); 2586 if (IS_ERR(trans)) { 2587 ret = PTR_ERR(trans); 2588 trans = NULL; 2589 goto out; 2590 } 2591 trans->block_rsv = &root->fs_info->delalloc_block_rsv; 2592 ret = btrfs_update_inode_fallback(trans, root, inode); 2593 if (ret) /* -ENOMEM or corruption */ 2594 btrfs_abort_transaction(trans, root, ret); 2595 goto out; ------------------------->here we goto out directly, unpin_extent_cache() won''t be called. 2596 } I don''t know why we do something like this... Previously, i unset PINNED flag directly is a lazy approach, that is because i think it dosen''t hurt to do so, because we are going to evict that inode, and after btrfs_invalidatepages() is called, nobody should can still access those pages. Besides, i think we don''t need call btrfs_wait_ordered_extents() in evicting inode here.. (Expecially after your patch that we remove extent map cache).... What do you think ^_^> > I''m ok with your approach too. > > thanks > > >> >> What do you think? >> >> thanks >> >>> >>> >>>> >>>> Does it makes sense? >>>> >>>> thanks >>>> >>>>> >>>>> Thanks, >>>>> Wang >>>>>> >>>>>> Thanks Wang for finding out this. >>>>>> >>>>>> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> >>>>>> --- >>>>>> fs/btrfs/inode.c | 5 +++-- >>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >>>>>> index e889779..c2933fb 100644 >>>>>> --- a/fs/btrfs/inode.c >>>>>> +++ b/fs/btrfs/inode.c >>>>>> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode) >>>>>> ASSERT(inode->i_state & I_FREEING); >>>>>> truncate_inode_pages(&inode->i_data, 0); >>>>>> >>>>>> + /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ >>>>>> + btrfs_wait_ordered_range(inode, 0, (u64)-1); >>>>>> + >>>>>> write_lock(&map_tree->lock); >>>>>> while (!RB_EMPTY_ROOT(&map_tree->map)) { >>>>>> struct extent_map *em; >>>>>> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode) >>>>>> btrfs_orphan_del(NULL, inode); >>>>>> goto no_delete; >>>>>> } >>>>>> - /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ >>>>>> - btrfs_wait_ordered_range(inode, 0, (u64)-1); >>>>>> >>>>>> if (root->fs_info->log_root_recovering) { >>>>>> BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, >>>>>> -- >>>>>> 1.7.9.5 >>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> >>>> -- >>>> Filipe David Manana, >>>> >>>> "Reasonable men adapt themselves to the world. >>>> Unreasonable men adapt the world to themselves. >>>> That''s why all progress depends on unreasonable men." >> >> >> >> -- >> Filipe David Manana, >> >> "Reasonable men adapt themselves to the world. >> Unreasonable men adapt the world to themselves. >> That''s why all progress depends on unreasonable men." > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That''s why all progress depends on unreasonable men."-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filipe David Manana
2013-Dec-15 12:41 UTC
Re: [PATCH] Btrfs: wait for ordered extents before removing extent maps
On Sun, Dec 15, 2013 at 3:34 AM, Shilong Wang <wangshilong1991@gmail.com> wrote:> 2013/12/14 Filipe David Manana <fdmanana@gmail.com>: >> On Sat, Dec 14, 2013 at 3:13 PM, Filipe David Manana <fdmanana@gmail.com> wrote: >>> On Sat, Dec 14, 2013 at 3:08 PM, Shilong Wang <wangshilong1991@gmail.com> wrote: >>>> 2013/12/14 Filipe David Manana <fdmanana@gmail.com>: >>>>> On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <wangshilong1991@gmail.com> wrote: >>>>>> Hello Filipe, >>>>>> >>>>>> 2013/12/14 Filipe David Borba Manana <fdmanana@gmail.com>: >>>>>>> Wang Shilong got into a case where during inode eviction we were >>>>>>> removing an extent map while it was pinned. This triggered a warning >>>>>>> in remove_extent_mapping() because the extent map had the pinned >>>>>>> flag set: >>>>>>> >>>>>>> [ 1209.102076] [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs] >>>>>>> [ 1209.102084] [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs] >>>>>>> [ 1209.102089] [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40 >>>>>>> [ 1209.102092] [<ffffffff8118ab2e>] evict+0x9e/0x190 >>>>>>> [ 1209.102094] [<ffffffff8118b313>] iput+0xf3/0x180 >>>>>>> [ 1209.102101] [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs] >>>>>>> [ 1209.102107] [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs] >>>>>>> >>>>>>> Therefore wait for any pending ordered extents, if any, which will >>>>>>> trigger calls to unpin_extent_cache(), before removing the extent maps. >>>>>>> >>>>>>> Wang''s solution of simply clearing the pinned bit wasn''t enough, as after >>>>>>> unpin_extent_cache() will be called and trigger another WARN_ON() because >>>>>>> the lookup for the extent map returned NULL. >>>>>> >>>>>> Why not in evict_inode_truncate_pages() move remove_extent_mapping() after >>>>>> clear_extent_bit()? >>>>> >>>>> So, if the pinned bit is set, it means some task will clear it later, >>>>> via unpin_extent_cache(). And if you look at that function, it has >>>>> this: >>>>> >>>>> write_lock(&tree->lock); >>>>> em = lookup_extent_mapping(tree, start, len); >>>>> >>>>> WARN_ON(!em || em->start != start); >>>>> >>>>> And remove_extent_mapping() will remove the em from the rbtree, >>>>> regardless of its reference count value, therefore triggering that >>>>> warning above. >>>> >>>> Here i mean, in evict_inode_truncate_pages() >>>> We change it to: >>>> >>>> Step1: unpin_extent_cache() >>>> Step2: remove it from extent_mapping >>>> >>>> Dose this cause any problems? i am a little confused, correct me if i >>>> am wrong some places^_^. >>> >>> It can still lead to the same WARN_ON I think. So when calling >>> unpin_extent_cache(), it can merge the em with its left neighbor, >>> therefore changing its ->start value. So later, if other task (the one >>> which set the pinned flag) calls remove_extent_mapping(), it will get >>> an em with a different ->start (because of the merge), therefore >>> triggering that WARN_ON(). >> >> Or because it is not found the second time. >> >> On the other hand, you didn''t get such WARN_ON triggered, right? > > During my test, i did not hit another WARN_ON in fact. > >> >> So maybe just clearing the pinned bit is ok. So btrfs_invalidatepage, >> if it finds an ordered extent, it sets the BTRFS_ORDERED_TRUNCATED >> flag on it, and then it might call btrfs_finish_ordered_io() against >> it, which not always unpins the extent when it has the truncated flag >> set. So this might well be what you ran into. > > Let''s dig more. > > I take a deep look in btrfs_finish_ordered_io(), it won''t unset PINNED flag > For an NOCOW extent. > > 2579 if (test_bit(BTRFS_ORDERED_NOCOW, > &ordered_extent->flags)) { > 2580 BUG_ON(!list_empty(&ordered_extent->list)); /* > Logic error */ > 2581 btrfs_ordered_update_i_size(inode, 0, ordered_extent); > 2582 if (nolock) > 2583 trans = btrfs_join_transaction_nolock(root); > 2584 else > 2585 trans = btrfs_join_transaction(root); > 2586 if (IS_ERR(trans)) { > 2587 ret = PTR_ERR(trans); > 2588 trans = NULL; > 2589 goto out; > 2590 } > 2591 trans->block_rsv = &root->fs_info->delalloc_block_rsv; > 2592 ret = btrfs_update_inode_fallback(trans, root, inode); > 2593 if (ret) /* -ENOMEM or corruption */ > 2594 btrfs_abort_transaction(trans, root, ret); > 2595 goto out; > ------------------------->here we goto out directly, > unpin_extent_cache() won''t be called. > 2596 } > I don''t know why we do something like this... > > > Previously, i unset PINNED flag directly is a lazy approach, that is > because i think it > dosen''t hurt to do so, because we are going to evict that inode, and after > btrfs_invalidatepages() is called, nobody should can still access those pages. > > > Besides, i think we don''t need call btrfs_wait_ordered_extents() in > evicting inode here.. > (Expecially after your patch that we remove extent map cache).... > > > What do you think ^_^I think it might be ok. Thanks> > >> >> I''m ok with your approach too. >> >> thanks >> >> >>> >>> What do you think? >>> >>> thanks >>> >>>> >>>> >>>>> >>>>> Does it makes sense? >>>>> >>>>> thanks >>>>> >>>>>> >>>>>> Thanks, >>>>>> Wang >>>>>>> >>>>>>> Thanks Wang for finding out this. >>>>>>> >>>>>>> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> >>>>>>> --- >>>>>>> fs/btrfs/inode.c | 5 +++-- >>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >>>>>>> index e889779..c2933fb 100644 >>>>>>> --- a/fs/btrfs/inode.c >>>>>>> +++ b/fs/btrfs/inode.c >>>>>>> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct inode *inode) >>>>>>> ASSERT(inode->i_state & I_FREEING); >>>>>>> truncate_inode_pages(&inode->i_data, 0); >>>>>>> >>>>>>> + /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ >>>>>>> + btrfs_wait_ordered_range(inode, 0, (u64)-1); >>>>>>> + >>>>>>> write_lock(&map_tree->lock); >>>>>>> while (!RB_EMPTY_ROOT(&map_tree->map)) { >>>>>>> struct extent_map *em; >>>>>>> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode) >>>>>>> btrfs_orphan_del(NULL, inode); >>>>>>> goto no_delete; >>>>>>> } >>>>>>> - /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */ >>>>>>> - btrfs_wait_ordered_range(inode, 0, (u64)-1); >>>>>>> >>>>>>> if (root->fs_info->log_root_recovering) { >>>>>>> BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, >>>>>>> -- >>>>>>> 1.7.9.5 >>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>>> >>>>> -- >>>>> Filipe David Manana, >>>>> >>>>> "Reasonable men adapt themselves to the world. >>>>> Unreasonable men adapt the world to themselves. >>>>> That''s why all progress depends on unreasonable men." >>> >>> >>> >>> -- >>> Filipe David Manana, >>> >>> "Reasonable men adapt themselves to the world. >>> Unreasonable men adapt the world to themselves. >>> That''s why all progress depends on unreasonable men." >> >> >> >> -- >> Filipe David Manana, >> >> "Reasonable men adapt themselves to the world. >> Unreasonable men adapt the world to themselves. >> That''s why all progress depends on unreasonable men."-- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That''s why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html