We all keep getting those stupid warnings from use_block_rsv when running stress.sh, and it''s because the delayed insertion stuff is being stupid. It''s not the delayed insertion stuffs fault, it''s all just stupid. When marking an inode dirty for oh say updating the time on it, we just do a btrfs_join_transaction, which doesn''t reserve any space. This is stupid because we''re going to have to have space reserve to make this change, but we do it because it''s fast because chances are we''re going to call it over and over again and it doesn''t matter. Well thanks to the delayed insertion stuff this is mostly the case, so we do actually need to make this reservation. So if trans->bytes_reserved is 0 then try to do a normal reservation. If not return ENOSPC which will make the btrfs_dirty_inode start a proper transaction which will let it do the whole ENOSPC dance and reserve enough space for the delayed insertion to steal the reservation from the transaction. The other stupid thing we do is not reserve space for the inode when writing to the thing. Usually this is ok since we have to update the time so we''d have already done all this work before we get to the endio stuff, so it doesn''t matter. But this is stupid because we could write the data after the transaction commits where we changed the mtime of the inode so we have to cow all the way down to the inode anyway. This used to be masked by the delalloc reservation stuff, but because we delay the update it doesn''t get masked in this case. So again the delayed insertion stuff bites us in the ass. So if our trans->block_rsv is delalloc, just steal the reservation from the delalloc reserve. Hopefully this won''t bite us in the ass, but I''ve said that before. With this patch stress.sh no longer spits out those stupid warnings (famous last words). Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> --- V1->V2: I stupidly thought I could get away with some flushing if we needed space but I was wrong, we could deadlock, so add a btrfs_add_bytes_noflush variant that will not do any flushing and will just return ENOSPC which will let us fallback and do our full flushing. fs/btrfs/ctree.h | 3 +++ fs/btrfs/delayed-inode.c | 36 ++++++++++++++++++++++++++++-------- fs/btrfs/extent-tree.c | 18 ++++++++++++++++++ 3 files changed, 49 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 3b5c04f..f956bbf 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2258,6 +2258,9 @@ void btrfs_free_block_rsv(struct btrfs_root *root, int btrfs_block_rsv_add(struct btrfs_root *root, struct btrfs_block_rsv *block_rsv, u64 num_bytes); +int btrfs_block_rsv_add_noflush(struct btrfs_root *root, + struct btrfs_block_rsv *block_rsv, + u64 num_bytes); int btrfs_block_rsv_check(struct btrfs_root *root, struct btrfs_block_rsv *block_rsv, int min_factor); int btrfs_block_rsv_refill(struct btrfs_root *root, diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index fc4026a..bbe8496 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -624,13 +624,36 @@ static int btrfs_delayed_inode_reserve_metadata( u64 num_bytes; int ret; - if (!trans->bytes_reserved) - return 0; - src_rsv = trans->block_rsv; dst_rsv = &root->fs_info->delayed_block_rsv; num_bytes = btrfs_calc_trans_metadata_size(root, 1); + + /* + * btrfs_dirty_inode will update the inode under btrfs_join_transaction + * which doesn''t reserve space for speed. This is a problem since we + * still need to reserve space for this update, so try to reserve the + * space. + * + * Now if src_rsv == delalloc_block_rsv we''ll let it just steal since + * we''re accounted for. + */ + if (!trans->bytes_reserved && + src_rsv != &root->fs_info->delalloc_block_rsv) { + ret = btrfs_block_rsv_add_noflush(root, dst_rsv, num_bytes); + /* + * Since we''re under a transaction reserve_metadata_bytes could + * try to commit the transaction which will make it return + * EAGAIN to make us stop the transaction we have, so return + * ENOSPC instead so that btrfs_dirty_inode knows what to do. + */ + if (ret == -EAGAIN) + ret = -ENOSPC; + if (!ret) + node->bytes_reserved = num_bytes; + return ret; + } + ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes); if (!ret) node->bytes_reserved = num_bytes; @@ -1686,11 +1709,8 @@ int btrfs_delayed_update_inode(struct btrfs_trans_handle *trans, } ret = btrfs_delayed_inode_reserve_metadata(trans, root, delayed_node); - /* - * we must reserve enough space when we start a new transaction, - * so reserving metadata failure is impossible - */ - BUG_ON(ret); + if (ret) + goto release_node; fill_stack_inode_item(trans, &delayed_node->inode_item, inode); delayed_node->inode_dirty = 1; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 19c2d48..8dfe2e0 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3814,6 +3814,24 @@ int btrfs_block_rsv_add(struct btrfs_root *root, return ret; } +int btrfs_block_rsv_add_noflush(struct btrfs_root *root, + struct btrfs_block_rsv *block_rsv, + u64 num_bytes) +{ + int ret; + + if (num_bytes == 0) + return 0; + + ret = reserve_metadata_bytes(root, block_rsv, num_bytes, 0); + if (!ret) { + block_rsv_add_bytes(block_rsv, num_bytes, 1); + return 0; + } + + return ret; +} + int btrfs_block_rsv_check(struct btrfs_root *root, struct btrfs_block_rsv *block_rsv, int min_factor) { -- 1.7.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2011-Nov-05 00:47 UTC
Re: [PATCH] Btrfs: fix delayed insertion reservations V2
On Fri, Nov 04, 2011 at 05:18:54PM -0400, Josef Bacik wrote:> V1->V2: I stupidly thought I could get away with some flushing if we needed > space but I was wrong, we could deadlock, so add a btrfs_add_bytes_noflush > variant that will not do any flushing and will just return ENOSPC which will let > us fallback and do our full flushing. > fs/btrfs/ctree.h | 3 +++ > fs/btrfs/delayed-inode.c | 36 ++++++++++++++++++++++++++++-------- > fs/btrfs/extent-tree.c | 18 ++++++++++++++++++ > 3 files changed, 49 insertions(+), 8 deletions(-)This helps but it I''ve got a bunch of these in the log now: btrfs_dirty_inode: 205 callbacks suppressed btrfs: fail to dirty inode 1294 error -28 btrfs: fail to dirty inode 1208 error -28 btrfs: fail to dirty inode 714 error -28 btrfs: fail to dirty inode 1772 error -28 btrfs: fail to dirty inode 1389 error -28 btrfs: fail to dirty inode 770 error -28 btrfs: fail to dirty inode 345 error -28 btrfs: fail to dirty inode 1533 error -28 btrfs: fail to dirty inode 624 error -28 btrfs: fail to dirty inode 1085 error -28 -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2011-Nov-05 11:46 UTC
Re: [PATCH] Btrfs: fix delayed insertion reservations V2
On Fri, Nov 04, 2011 at 08:47:56PM -0400, Chris Mason wrote:> On Fri, Nov 04, 2011 at 05:18:54PM -0400, Josef Bacik wrote: > > V1->V2: I stupidly thought I could get away with some flushing if we needed > > space but I was wrong, we could deadlock, so add a btrfs_add_bytes_noflush > > variant that will not do any flushing and will just return ENOSPC which will let > > us fallback and do our full flushing. > > fs/btrfs/ctree.h | 3 +++ > > fs/btrfs/delayed-inode.c | 36 ++++++++++++++++++++++++++++-------- > > fs/btrfs/extent-tree.c | 18 ++++++++++++++++++ > > 3 files changed, 49 insertions(+), 8 deletions(-) > > This helps but it I''ve got a bunch of these in the log now: > > btrfs_dirty_inode: 205 callbacks suppressed > btrfs: fail to dirty inode 1294 error -28 > btrfs: fail to dirty inode 1208 error -28 > btrfs: fail to dirty inode 714 error -28 > btrfs: fail to dirty inode 1772 error -28 > btrfs: fail to dirty inode 1389 error -28 > btrfs: fail to dirty inode 770 error -28 > btrfs: fail to dirty inode 345 error -28 > btrfs: fail to dirty inode 1533 error -28 > btrfs: fail to dirty inode 624 error -28 > btrfs: fail to dirty inode 1085 error -28I should add this only seems to happen during xfstest 083. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mitch Harder
2011-Nov-05 12:23 UTC
Re: [PATCH] Btrfs: fix delayed insertion reservations V2
On Sat, Nov 5, 2011 at 6:46 AM, Chris Mason <chris.mason@oracle.com> wrote:> On Fri, Nov 04, 2011 at 08:47:56PM -0400, Chris Mason wrote: >> On Fri, Nov 04, 2011 at 05:18:54PM -0400, Josef Bacik wrote: >> > V1->V2: I stupidly thought I could get away with some flushing if we needed >> > space but I was wrong, we could deadlock, so add a btrfs_add_bytes_noflush >> > variant that will not do any flushing and will just return ENOSPC which will let >> > us fallback and do our full flushing. >> > fs/btrfs/ctree.h | 3 +++ >> > fs/btrfs/delayed-inode.c | 36 ++++++++++++++++++++++++++++-------- >> > fs/btrfs/extent-tree.c | 18 ++++++++++++++++++ >> > 3 files changed, 49 insertions(+), 8 deletions(-) >> >> This helps but it I''ve got a bunch of these in the log now: >> >> btrfs_dirty_inode: 205 callbacks suppressed >> btrfs: fail to dirty inode 1294 error -28 >> btrfs: fail to dirty inode 1208 error -28 >> btrfs: fail to dirty inode 714 error -28 >> btrfs: fail to dirty inode 1772 error -28 >> btrfs: fail to dirty inode 1389 error -28 >> btrfs: fail to dirty inode 770 error -28 >> btrfs: fail to dirty inode 345 error -28 >> btrfs: fail to dirty inode 1533 error -28 >> btrfs: fail to dirty inode 624 error -28 >> btrfs: fail to dirty inode 1085 error -28 > > I should add this only seems to happen during xfstest 083. >This version of the patch clears the deadlock issue I was seeing with version 1 of this patch. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html