Miao Xie
2012-Jan-17 10:02 UTC
[RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
If there is no free space, the free space allocator will try to get space from the block group with the degenerated profile. For example, if there is no free space in the RAID1 block groups, the allocator will try to allocate space from the DUP block groups. And besides that, the space reservation has the similar behaviour: if there is no enough space in the space cache to reserve, it will reserve the space according to the disk space, and it just take mirror storage into account, no RAID0, RAID1, or RAID10. So we''d better make the behaviour of chunk allocation correspond with space reservation and free space allocation, if there is no enough disk space to allocate RAID(RAID0, RAID1, RAID10) chunks, we degenerate the profile and try to allocate chunks again. Otherwise, enospc will happen though we reserve the space successfully and BUG_ON() will be triggered. Degenerating rule: RAID10 -> RAID1 -> DUP RAID0 -> SINGLE Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/extent-tree.c | 43 +++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 41 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3e68e2b..87cd611 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3065,6 +3065,30 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) return flags; } +/* + * Degenerate the alloc profile: + * RAID10 -> RAID1 -> DUP + * RAID0 -> SINGLE + * + * This is used when there is no enough disk space to do chunk allocation. + * After degenerating the profile, we will try to allocate new chunks again. + */ +static u64 btrfs_degenerate_alloc_profile(u64 flags) +{ + if (flags & BTRFS_BLOCK_GROUP_RAID10) { + flags &= ~BTRFS_BLOCK_GROUP_RAID10; + flags |= BTRFS_BLOCK_GROUP_RAID1; + } else if (flags & BTRFS_BLOCK_GROUP_RAID1) { + flags &= ~BTRFS_BLOCK_GROUP_RAID1; + flags |= BTRFS_BLOCK_GROUP_DUP; + } else if (flags & BTRFS_BLOCK_GROUP_RAID0) { + flags &= ~BTRFS_BLOCK_GROUP_RAID0; + } else + flags = ULLONG_MAX; + + return flags; +} + static u64 get_alloc_profile(struct btrfs_root *root, u64 flags) { if (flags & BTRFS_BLOCK_GROUP_DATA) @@ -3356,8 +3380,23 @@ again: } ret = btrfs_alloc_chunk(trans, extent_root, flags); - if (ret < 0 && ret != -ENOSPC) - goto out; + if (ret < 0) { + if (ret != -ENOSPC) + goto out; + + /* + * Degenerate the alloc profile: + * RAID10 -> RAID1 -> DUP + * RAID0 -> SINGLE + * then we will try to allocate new chunks again. By this way, + * we can utilize the whole disk spacem and make the behaviour + * of the chunk allocation correspond with the space reservation + * and the free space allocation. + */ + flags = btrfs_degenerate_alloc_profile(flags); + if (flags != ULLONG_MAX) + goto again; + } spin_lock(&space_info->lock); if (ret) -- 1.7.6.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
Chris Mason
2012-Jan-17 20:58 UTC
Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
These two didn''t make my first pull request just because I wanted to get something out the door. I''ll definitely have them in the next pull. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jan-18 10:12 UTC
Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
On 17.01.2012 21:58, Chris Mason wrote:> These two didn''t make my first pull request just because I wanted to get > something out the door. I''ll definitely have them in the next pull.Please, don''t do that! You can''t just degenerate to DUP when RAID1 is out of space, that''s entirely different. It''s debatable whether degeneration from RAID0 to single is acceptable, but that again has different characteristics. ENOSPC is the best choice for both in my opinon. -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
Arne Jansen
2012-Jan-18 10:14 UTC
Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
On 17.01.2012 11:02, Miao Xie wrote:> If there is no free space, the free space allocator will try to get space from > the block group with the degenerated profile. For example, if there is no free > space in the RAID1 block groups, the allocator will try to allocate space from > the DUP block groups. And besides that, the space reservation has the similar > behaviour: if there is no enough space in the space cache to reserve, it will > reserve the space according to the disk space, and it just take mirror storage > into account, no RAID0, RAID1, or RAID10. > > So we''d better make the behaviour of chunk allocation correspond with space > reservation and free space allocation, if there is no enough disk space to > allocate RAID(RAID0, RAID1, RAID10) chunks, we degenerate the profile and try > to allocate chunks again. Otherwise, enospc will happen though we reserve > the space successfully and BUG_ON() will be triggered. > > Degenerating rule: > RAID10 -> RAID1 -> DUP > RAID0 -> SINGLE >Instead of changing the profile, wouldn''t it be easier to just allow RAID10 go down to 2 disks and RAID0 to 1? That would make things easier in many places. What I''m strongly opposed to is changing a RAID1 to DUP, as this loses the protection against a single disk failure. -Arne> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> > --- > fs/btrfs/extent-tree.c | 43 +++++++++++++++++++++++++++++++++++++++++-- > 1 files changed, 41 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 3e68e2b..87cd611 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -3065,6 +3065,30 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) > return flags; > } > > +/* > + * Degenerate the alloc profile: > + * RAID10 -> RAID1 -> DUP > + * RAID0 -> SINGLE > + * > + * This is used when there is no enough disk space to do chunk allocation. > + * After degenerating the profile, we will try to allocate new chunks again. > + */ > +static u64 btrfs_degenerate_alloc_profile(u64 flags) > +{ > + if (flags & BTRFS_BLOCK_GROUP_RAID10) { > + flags &= ~BTRFS_BLOCK_GROUP_RAID10; > + flags |= BTRFS_BLOCK_GROUP_RAID1; > + } else if (flags & BTRFS_BLOCK_GROUP_RAID1) { > + flags &= ~BTRFS_BLOCK_GROUP_RAID1; > + flags |= BTRFS_BLOCK_GROUP_DUP; > + } else if (flags & BTRFS_BLOCK_GROUP_RAID0) { > + flags &= ~BTRFS_BLOCK_GROUP_RAID0; > + } else > + flags = ULLONG_MAX; > + > + return flags; > +} > + > static u64 get_alloc_profile(struct btrfs_root *root, u64 flags) > { > if (flags & BTRFS_BLOCK_GROUP_DATA) > @@ -3356,8 +3380,23 @@ again: > } > > ret = btrfs_alloc_chunk(trans, extent_root, flags); > - if (ret < 0 && ret != -ENOSPC) > - goto out; > + if (ret < 0) { > + if (ret != -ENOSPC) > + goto out; > + > + /* > + * Degenerate the alloc profile: > + * RAID10 -> RAID1 -> DUP > + * RAID0 -> SINGLE > + * then we will try to allocate new chunks again. By this way, > + * we can utilize the whole disk spacem and make the behaviour > + * of the chunk allocation correspond with the space reservation > + * and the free space allocation. > + */ > + flags = btrfs_degenerate_alloc_profile(flags); > + if (flags != ULLONG_MAX) > + goto again; > + } > > spin_lock(&space_info->lock); > if (ret)-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
David Sterba
2012-Jan-18 12:34 UTC
Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
On Tue, Jan 17, 2012 at 06:02:32PM +0800, Miao Xie wrote:> So we''d better make the behaviour of chunk allocation correspond with space > reservation and free space allocation, if there is no enough disk space to > allocate RAID(RAID0, RAID1, RAID10) chunks, we degenerate the profile and try > to allocate chunks again. Otherwise, enospc will happen though we reserve > the space successfully and BUG_ON() will be triggered. > > Degenerating rule: > RAID10 -> RAID1 -> DUP > RAID0 -> SINGLEI think that silently changing RAID level under unpredictable conditions could be a very unpleasant suprise to the administrator. Even worse when it requires manual intervention to undo the change and without any possibility to prevent it to happen again. In case ENOSPC happens early due to overbooking (like untarring lots of files as reported several times), will probably make it happen sooner than when "there is absolutely no free space available". We will lose automatic raid1-repair and scrub repair capabilities, that''s not good. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
Hugo Mills
2012-Jan-18 12:41 UTC
Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
On Wed, Jan 18, 2012 at 11:12:20AM +0100, Jan Schmidt wrote:> On 17.01.2012 21:58, Chris Mason wrote: > > These two didn''t make my first pull request just because I wanted to get > > something out the door. I''ll definitely have them in the next pull. > > Please, don''t do that! You can''t just degenerate to DUP when RAID1 is > out of space, that''s entirely different.Agreed. This isn''t a good idea.> It''s debatable whether degeneration from RAID0 to single is acceptable, > but that again has different characteristics.RAID-0 to single and RAID-10 to RAID-1 are less controversial, I think, although we need to be clear about what the performance guarantees of the striping are (and how they would be affected by the move to fewer disks). Given that we already degrade the striping of RAID-0 down to "as many devices as we can fit right now", we''re not really providing much in the way of performance guarantees. (I''d definitely support some way of defining a fixed-width stripe, but that''s a whole separate question that we shouldn''t get into now).> ENOSPC is the best choice for both in my opinon. > > -Jan-- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or carfax.org.uk --- What''s a Nazgûl like you doing in a place like this? ---
Roman Kapusta
2012-Jan-18 13:42 UTC
Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
On Wed, Jan 18, 2012 at 13:41, Hugo Mills <hugo@carfax.org.uk> wrote:> On Wed, Jan 18, 2012 at 11:12:20AM +0100, Jan Schmidt wrote: >> On 17.01.2012 21:58, Chris Mason wrote: >> > These two didn''t make my first pull request just because I wanted to get >> > something out the door. I''ll definitely have them in the next pull. >> >> Please, don''t do that! You can''t just degenerate to DUP when RAID1 is >> out of space, that''s entirely different. > > Agreed. This isn''t a good idea. > >> It''s debatable whether degeneration from RAID0 to single is acceptable, >> but that again has different characteristics. > > RAID-0 to single and RAID-10 to RAID-1 are less controversial, I > think, although we need to be clear about what the performance > guarantees of the striping are (and how they would be affected by the > move to fewer disks). Given that we already degrade the striping of > RAID-0 down to "as many devices as we can fit right now", we''re not > really providing much in the way of performance guarantees.Definitely this degeneration is good idea, but it should be only optional and not default behavior of filesystem, default should be ENOSPC.> > (I''d definitely support some way of defining a fixed-width stripe, > but that''s a whole separate question that we shouldn''t get into now). > >> ENOSPC is the best choice for both in my opinon. >> >> -Jan > > -- > === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ==> PGP key: 515C238D from wwwkeys.eu.pgp.net or carfax.org.uk > --- What''s a Nazgûl like you doing in a place like this? ----- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
Miao Xie
2012-Jan-19 05:58 UTC
Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
On wed, 18 Jan 2012 11:12:20 +0100, Jan Schmidt wrote:> On 17.01.2012 21:58, Chris Mason wrote: >> These two didn''t make my first pull request just because I wanted to get >> something out the door. I''ll definitely have them in the next pull. > > Please, don''t do that! You can''t just degenerate to DUP when RAID1 is > out of space, that''s entirely different. > > It''s debatable whether degeneration from RAID0 to single is acceptable, > but that again has different characteristics. > > ENOSPC is the best choice for both in my opinon.I understand what you said, but in fact, the free space allocator can degenerate the profile if it doesn''t find enough free space. This patch just follows the rule which exists in the current code. Maybe adding a new mount option is another good option. Thanks Miao> > -Jan > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
Jan Schmidt
2012-Jan-20 10:36 UTC
Re: [RFC][PATCH 1/2] Btrfs: try to allocate new chunks with degenerated profile
On 19.01.2012 06:58, Miao Xie wrote:> On wed, 18 Jan 2012 11:12:20 +0100, Jan Schmidt wrote: >> On 17.01.2012 21:58, Chris Mason wrote: >>> These two didn''t make my first pull request just because I wanted to get >>> something out the door. I''ll definitely have them in the next pull. >> >> Please, don''t do that! You can''t just degenerate to DUP when RAID1 is >> out of space, that''s entirely different. >> >> It''s debatable whether degeneration from RAID0 to single is acceptable, >> but that again has different characteristics. >> >> ENOSPC is the best choice for both in my opinon. > > I understand what you said, but in fact, the free space allocator can degenerate > the profile if it doesn''t find enough free space. This patch just follows the > rule which exists in the current code.I''m not sure what you mean with "free space allocation". In my understanding, new "free" space is made available by allocating a new chunk, and that''s what you''re suggesting to change. What am I missing here? Calculation of free space like for df output is known to be at least unintuitive (I''d say wrong). We''d better fix that. Space reservation may be wrongly assuming it can use the whole disk. If that''s the case, we must fix it.> Maybe adding a new mount option is another good option.I don''t think so. If you want RAID-1, you get RAID-1, i.e. every stripe on two disks. If you''re okay with DUP, use DUP. But even with a mount option, degeneration would still go silent and you''d never know which part of your data would survive a single disk failure. Think of degenration of your metadata profile to DUP: Suddenly, your most recent extent tree wouldn''t survive a single disk failure. In my opinion, we''re responsible not to offer any dangerous (mount) options that can make users lose data eventually. Even if we can point them to the documentation (which they didn''t read) that explains the risks of that option. -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html