invalidate_inode_pages2_range may return -EBUSY occasionally which results Oops. This patch fixes the issue by moving invalidate_inode_pages2_range into a loop and keeping calling it until the return value is not -EBUSY. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> --- diff -urp 1/fs/btrfs/relocation.c 2/fs/btrfs/relocation.c --- 1/fs/btrfs/relocation.c 2009-07-29 10:03:04.367858774 +0800 +++ 2/fs/btrfs/relocation.c 2009-08-07 13:26:43.882147138 +0800 @@ -2553,8 +2553,13 @@ int relocate_inode_pages(struct inode *i last_index = (start + len - 1) >> PAGE_CACHE_SHIFT; /* make sure the dirty trick played by the caller work */ - ret = invalidate_inode_pages2_range(inode->i_mapping, - first_index, last_index); + while (1) { + ret = invalidate_inode_pages2_range(inode->i_mapping, + first_index, last_index); + if (ret != -EBUSY) + break; + cond_resched(); + } if (ret) goto out_unlock; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07 2009, Yan Zheng wrote:> invalidate_inode_pages2_range may return -EBUSY occasionally > which results Oops. This patch fixes the issue by moving > invalidate_inode_pages2_range into a loop and keeping calling > it until the return value is not -EBUSY. > > Signed-off-by: Yan Zheng <zheng.yan@oracle.com> > > --- > diff -urp 1/fs/btrfs/relocation.c 2/fs/btrfs/relocation.c > --- 1/fs/btrfs/relocation.c 2009-07-29 10:03:04.367858774 +0800 > +++ 2/fs/btrfs/relocation.c 2009-08-07 13:26:43.882147138 +0800 > @@ -2553,8 +2553,13 @@ int relocate_inode_pages(struct inode *i > last_index = (start + len - 1) >> PAGE_CACHE_SHIFT; > > /* make sure the dirty trick played by the caller work */ > - ret = invalidate_inode_pages2_range(inode->i_mapping, > - first_index, last_index); > + while (1) { > + ret = invalidate_inode_pages2_range(inode->i_mapping, > + first_index, last_index); > + if (ret != -EBUSY) > + break; > + cond_resched(); > + }If it returns EBUSY, would it not make more sense to call filemap_write_and_wait_range() instead of hammering on invalidate? -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/07/2009 02:50 PM, Jens Axboe wrote:> On Fri, Aug 07 2009, Yan Zheng wrote: >> invalidate_inode_pages2_range may return -EBUSY occasionally >> which results Oops. This patch fixes the issue by moving >> invalidate_inode_pages2_range into a loop and keeping calling >> it until the return value is not -EBUSY. >> >> Signed-off-by: Yan Zheng <zheng.yan@oracle.com> >> >> --- >> diff -urp 1/fs/btrfs/relocation.c 2/fs/btrfs/relocation.c >> --- 1/fs/btrfs/relocation.c 2009-07-29 10:03:04.367858774 +0800 >> +++ 2/fs/btrfs/relocation.c 2009-08-07 13:26:43.882147138 +0800 >> @@ -2553,8 +2553,13 @@ int relocate_inode_pages(struct inode *i >> last_index = (start + len - 1) >> PAGE_CACHE_SHIFT; >> >> /* make sure the dirty trick played by the caller work */ >> - ret = invalidate_inode_pages2_range(inode->i_mapping, >> - first_index, last_index); >> + while (1) { >> + ret = invalidate_inode_pages2_range(inode->i_mapping, >> + first_index, last_index); >> + if (ret != -EBUSY) >> + break; >> + cond_resched(); >> + } > > If it returns EBUSY, would it not make more sense to call > filemap_write_and_wait_range() instead of hammering on invalidate? >The pages to invalidate are not dirty, they are from page read-ahead. Actually I have no idea how invalidate_inode_pages2_range can return -EBUSY here. (the only user of the inode is the balancer, and it does not hold references to the pages) Regards Yan, Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07 2009, Yan Zheng wrote:> On 08/07/2009 02:50 PM, Jens Axboe wrote: > > On Fri, Aug 07 2009, Yan Zheng wrote: > >> invalidate_inode_pages2_range may return -EBUSY occasionally > >> which results Oops. This patch fixes the issue by moving > >> invalidate_inode_pages2_range into a loop and keeping calling > >> it until the return value is not -EBUSY. > >> > >> Signed-off-by: Yan Zheng <zheng.yan@oracle.com> > >> > >> --- > >> diff -urp 1/fs/btrfs/relocation.c 2/fs/btrfs/relocation.c > >> --- 1/fs/btrfs/relocation.c 2009-07-29 10:03:04.367858774 +0800 > >> +++ 2/fs/btrfs/relocation.c 2009-08-07 13:26:43.882147138 +0800 > >> @@ -2553,8 +2553,13 @@ int relocate_inode_pages(struct inode *i > >> last_index = (start + len - 1) >> PAGE_CACHE_SHIFT; > >> > >> /* make sure the dirty trick played by the caller work */ > >> - ret = invalidate_inode_pages2_range(inode->i_mapping, > >> - first_index, last_index); > >> + while (1) { > >> + ret = invalidate_inode_pages2_range(inode->i_mapping, > >> + first_index, last_index); > >> + if (ret != -EBUSY) > >> + break; > >> + cond_resched(); > >> + } > > > > If it returns EBUSY, would it not make more sense to call > > filemap_write_and_wait_range() instead of hammering on invalidate? > > > > The pages to invalidate are not dirty, they are from page read-ahead. > Actually I have no idea how invalidate_inode_pages2_range can return > -EBUSY here. (the only user of the inode is the balancer, and it does > not hold references to the pages)Weird, I looked it up, and it already does a writeback wait. But I guess that''s not your issue. Patch still looks like a hack though, it would be far better to figure out why it returns EBUSY and fix/wait appropriately for that to pass. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/07/2009 03:19 PM, Jens Axboe wrote:> On Fri, Aug 07 2009, Yan Zheng wrote: >> On 08/07/2009 02:50 PM, Jens Axboe wrote: >>> On Fri, Aug 07 2009, Yan Zheng wrote: >>>> invalidate_inode_pages2_range may return -EBUSY occasionally >>>> which results Oops. This patch fixes the issue by moving >>>> invalidate_inode_pages2_range into a loop and keeping calling >>>> it until the return value is not -EBUSY. >>>> >>>> Signed-off-by: Yan Zheng <zheng.yan@oracle.com> >>>> >>>> --- >>>> diff -urp 1/fs/btrfs/relocation.c 2/fs/btrfs/relocation.c >>>> --- 1/fs/btrfs/relocation.c 2009-07-29 10:03:04.367858774 +0800 >>>> +++ 2/fs/btrfs/relocation.c 2009-08-07 13:26:43.882147138 +0800 >>>> @@ -2553,8 +2553,13 @@ int relocate_inode_pages(struct inode *i >>>> last_index = (start + len - 1) >> PAGE_CACHE_SHIFT; >>>> >>>> /* make sure the dirty trick played by the caller work */ >>>> - ret = invalidate_inode_pages2_range(inode->i_mapping, >>>> - first_index, last_index); >>>> + while (1) { >>>> + ret = invalidate_inode_pages2_range(inode->i_mapping, >>>> + first_index, last_index); >>>> + if (ret != -EBUSY) >>>> + break; >>>> + cond_resched(); >>>> + } >>> If it returns EBUSY, would it not make more sense to call >>> filemap_write_and_wait_range() instead of hammering on invalidate? >>> >> The pages to invalidate are not dirty, they are from page read-ahead. >> Actually I have no idea how invalidate_inode_pages2_range can return >> -EBUSY here. (the only user of the inode is the balancer, and it does >> not hold references to the pages) > > Weird, I looked it up, and it already does a writeback wait. But I guess > that''s not your issue. Patch still looks like a hack though, it would be > far better to figure out why it returns EBUSY and fix/wait appropriately > for that to pass. >EBUSY is from the EXTENT_LOCK test in try_release_extent_state. The test can be true is because some codes call lock_extent while corresponding pages are not all locked. (one example is btrfs_finish_ordered_io) Yan, Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07, 2009 at 05:07:32PM +0800, Yan Zheng wrote:> On 08/07/2009 03:19 PM, Jens Axboe wrote: > > On Fri, Aug 07 2009, Yan Zheng wrote: > >> On 08/07/2009 02:50 PM, Jens Axboe wrote: > >>> On Fri, Aug 07 2009, Yan Zheng wrote: > >>>> invalidate_inode_pages2_range may return -EBUSY occasionally > >>>> which results Oops. This patch fixes the issue by moving > >>>> invalidate_inode_pages2_range into a loop and keeping calling > >>>> it until the return value is not -EBUSY. > >>>> > >>>> Signed-off-by: Yan Zheng <zheng.yan@oracle.com> > >>>> > >>>> --- > >>>> diff -urp 1/fs/btrfs/relocation.c 2/fs/btrfs/relocation.c > >>>> --- 1/fs/btrfs/relocation.c 2009-07-29 10:03:04.367858774 +0800 > >>>> +++ 2/fs/btrfs/relocation.c 2009-08-07 13:26:43.882147138 +0800 > >>>> @@ -2553,8 +2553,13 @@ int relocate_inode_pages(struct inode *i > >>>> last_index = (start + len - 1) >> PAGE_CACHE_SHIFT; > >>>> > >>>> /* make sure the dirty trick played by the caller work */ > >>>> - ret = invalidate_inode_pages2_range(inode->i_mapping, > >>>> - first_index, last_index); > >>>> + while (1) { > >>>> + ret = invalidate_inode_pages2_range(inode->i_mapping, > >>>> + first_index, last_index); > >>>> + if (ret != -EBUSY) > >>>> + break; > >>>> + cond_resched(); > >>>> + } > >>> If it returns EBUSY, would it not make more sense to call > >>> filemap_write_and_wait_range() instead of hammering on invalidate? > >>> > >> The pages to invalidate are not dirty, they are from page read-ahead. > >> Actually I have no idea how invalidate_inode_pages2_range can return > >> -EBUSY here. (the only user of the inode is the balancer, and it does > >> not hold references to the pages) > > > > Weird, I looked it up, and it already does a writeback wait. But I guess > > that''s not your issue. Patch still looks like a hack though, it would be > > far better to figure out why it returns EBUSY and fix/wait appropriately > > for that to pass. > > > > EBUSY is from the EXTENT_LOCK test in try_release_extent_state. The test > can be true is because some codes call lock_extent while corresponding > pages are not all locked. (one example is btrfs_finish_ordered_io)Ok, please use schedule_timeout(HZ/10) instead then. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html