thr3ads.net - Btrfs devel - UI issues around RAID1 [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Roland Dreier

2009-Nov-16 18:45 UTC

UI issues around RAID1

I''ve just started playing around with btrfs RAID1, and I''ve
noticed a
couple of what seem to be UI issues.  Suppose I do something like
"mkfs.btrfs -d raid1 -m raid1 dev1 dev2".  I see the following minor
usability problems:

 - Unless I''m missing something, there doesn''t seem to be any
way later
   on to see that I set the data policy to raid1, except using
   btrfs-dump-tree and checking the flags bits for the appropriate
   group.  Which can make things confusing if I have a bunch of btrfs
   filesystems around.

 - The free space reporting doesn''t seem to take into account the fact
   that everything is going to be mirrored; so "df" et al report the
   size of the filesystem and free space on the new filesystem as
   size(dev1) + size(dev2) -- if dev1 and dev2 are the same size then I
   would assume it should really be just size(dev1) for a fully-RAID1
   filesystem.  (Not sure in general what we should say for a
   metadata-only mirrored filesystem, since we don''t really know in
   advance how much space we have exactly)

I''m happy to help fix these issues up; just want to make sure
I''m not
missing something or doing it wrong.

Thanks,
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2009-Nov-16 20:20 UTC

head link

Re: UI issues around RAID1

On Mon, Nov 16, 2009 at 10:45:28AM -0800, Roland Dreier
wrote:> I''ve just started playing around with btrfs RAID1, and
I''ve noticed a
> couple of what seem to be UI issues.  Suppose I do something like
> "mkfs.btrfs -d raid1 -m raid1 dev1 dev2".  I see the following
minor
> usability problems:
> 
>  - Unless I''m missing something, there doesn''t seem to be
any way later
>    on to see that I set the data policy to raid1, except using
>    btrfs-dump-tree and checking the flags bits for the appropriate
>    group.  Which can make things confusing if I have a bunch of btrfs
>    filesystems around.
> 
You aren''t missing anything, theres just nothing that spits that
information out
yet.  btrfs-show would probably be a good place to do this.
>  - The free space reporting doesn''t seem to take into account the
fact
>    that everything is going to be mirrored; so "df" et al report
the
>    size of the filesystem and free space on the new filesystem as
>    size(dev1) + size(dev2) -- if dev1 and dev2 are the same size then I
>    would assume it should really be just size(dev1) for a fully-RAID1
>    filesystem.  (Not sure in general what we should say for a
>    metadata-only mirrored filesystem, since we don''t really know
in
>    advance how much space we have exactly)
> 
Yeah df is just a fun ball of wax in many respects.  We don''t take into
account
RAID and we don''t subtrace space thats strictly for metadata, so there
are
several things that need to be fixed for df.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

jim owens

2009-Nov-16 21:48 UTC

head link

Re: UI issues around RAID1

Josef Bacik wrote:> On Mon, Nov 16, 2009 at 10:45:28AM -0800, Roland Dreier wrote:
>>  - The free space reporting doesn''t seem to take into account
the fact
>>    that everything is going to be mirrored; so "df" et al
report the
>>    size of the filesystem and free space on the new filesystem as
>>    size(dev1) + size(dev2) -- if dev1 and dev2 are the same size then I
>>    would assume it should really be just size(dev1) for a fully-RAID1
>>    filesystem.  (Not sure in general what we should say for a
>>    metadata-only mirrored filesystem, since we don''t really
know in
>>    advance how much space we have exactly)
>>
> 
> Yeah df is just a fun ball of wax in many respects.  We don''t take
into account
> RAID and we don''t subtrace space thats strictly for metadata, so
there are
> several things that need to be fixed for df.  Thanks,
But as we have said many times... if we have different
raid types active on different files, any attempt to make
df report "raid adjusted numbers" instead of the current raw
total storage numbers is going to sometimes give wrong answers.

So I think it is dangerous to try.  The current output
may be ugly, but it is always consistent and explainable.

jim
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

jim owens

2009-Nov-17 15:25 UTC

head link

Re: UI issues around RAID1

Andrey Kuzmin wrote:> On Tue, Nov 17, 2009 at 12:48 AM, jim owens <jowens@hp.com> wrote:
>> But as we have said many times... if we have different
>> raid types active on different files, any attempt to make
> 
> Late question, but could you please explain it a bit further (or point
> me to respective discussion archive)? Did I get it correct that btrfs
> supports per-file raid topology? Or is it per-(sub)volume?
The design of btrfs actually allows each extent inside a file
to have a different raid type.  This probably will never happen
unless a file is written, we add disks and mount with a new
raid type, and then modify part of the file. (this may not
behave how I think but I plan to test it someday soon).

There is a flag on the file to allow per-file raid setting
via ioctl/fcntl.  The typical use for this would be to
make a file DUPlicate type on a simple disk.  DUPlicate acts
like a raid 1 mirror on a single drive and is the default raid
type for metadata extents.

[disclaimer] btrfs is still in development and Chris might
say it does not (or will not in the future) work like I think.
>> df report "raid adjusted numbers" instead of the current raw
>> total storage numbers is going to sometimes give wrong answers.
> 
> I have always thought that space (both physical and logical) used by
> file-system could be accounted for correctly whatever topology or a
> mixture thereof is in effect, the only point worth discussion being
> accounting overhead. Free space, under variable topology, of course
> can only be reliably reported as raw (or as an ''if you use this
> topology,-then you have this logical capacity left'' list).
So we know the "raw free blocks", but can not guarantee
"how many raw blocks per new user write-block" will be
consumed because we do not know what topology will be
in effect for a new write.

We could cheat and use "worst-case topology" numbers
if all writes are the current default raid.  Of course
this ignores DUP unless it is set on the whole filesystem.

And we also have the problem of metadata - which is dynamic
and allocated in large chunks and has a DUP type, how do we
account for that in worst-case calculations.

The worst-case is probably wrong but may be more useful to
people to know when they will run out of space. Or at least
it might make some of our ENOSPC complaints go away :)

Only "raw" and "worst-case" can be explained to users and
which we report is up to Chris.  Today we report "raw".

After spending 10 years on a multi-volume filesystem that
had (unsolvable) confusing df output, I''m just of the
opinion that nothing we do will make everyone happy.

But feel free to run a patch proposal by Chris.

jim
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrey Kuzmin

2009-Nov-17 20:23 UTC

head link

Re: UI issues around RAID1

On Tue, Nov 17, 2009 at 6:25 PM, jim owens <jowens@hp.com>
wrote:> <snip>
> So we know the "raw free blocks", but can not guarantee
> "how many raw blocks per new user write-block" will be
> consumed because we do not know what topology will be
> in effect for a new write.
>
> We could cheat and use "worst-case topology" numbers
> if all writes are the current default raid.  Of course
> this ignores DUP unless it is set on the whole filesystem.
>
> And we also have the problem of metadata - which is dynamic
> and allocated in large chunks and has a DUP type, how do we
> account for that in worst-case calculations.
>
> The worst-case is probably wrong but may be more useful to
> people to know when they will run out of space. Or at least
> it might make some of our ENOSPC complaints go away :)
>
> Only "raw" and "worst-case" can be explained to users
and
> which we report is up to Chris.  Today we report "raw".
>
> After spending 10 years on a multi-volume filesystem that
> had (unsolvable) confusing df output, I''m just of the
> opinion that nothing we do will make everyone happy.
df is user-centric, and therefore is naturally expected to return
used/available _logical_ capacity (how this translates to used
physical space is up to file-system-specific tools to find
out/report). Returning raw is counter-intuitive and causes surprise
similar to that of Roland.

With so flexible, down to per-file, topology configuration the only
option I see for df to return logical capacity available is to compute
the  latter off the file-system object for which df is invoked. For
instance, ''df /path/to/some/file'' could return logical
capacity for
the mountpoint where some-file resides, computed from underlying
physical capacity available _and_ topology for this file. ''df
/mount-point'' would under this implementation return  available
logical capacity assuming default topology for the referenced
file-system.

As to used logical space accounting, this is file-system-specific and
I''m not yet familiar enough with btrfs code-base to argument for any
approach.

Regards,
Andrey>
> But feel free to run a patch proposal by Chris.
>
> jim
>--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Roland Dreier

2009-Nov-18 17:54 UTC

head link

Re: UI issues around RAID1

> >  - Unless I''m missing something, there doesn''t seem
to be any way later > >    on to see that I set the data policy to raid1, except using
 > >    btrfs-dump-tree and checking the flags bits for the appropriate
 > >    group.  Which can make things confusing if I have a bunch of btrfs
 > >    filesystems around.

 > You aren''t missing anything, theres just nothing that spits that
information out
 > yet.  btrfs-show would probably be a good place to do this.

Thanks.  I''ll look at adding in show more info about RAID policy to the
btrfs-show output.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Roland Dreier

2009-Nov-18 17:59 UTC

head link

Re: UI issues around RAID1

> > Yeah df is just a fun ball of wax in many respects.  We don''t
take into account > > RAID and we don''t subtrace space thats strictly for
metadata, so there are
 > > several things that need to be fixed for df.  Thanks,

 > But as we have said many times... if we have different
 > raid types active on different files, any attempt to make
 > df report "raid adjusted numbers" instead of the current raw
 > total storage numbers is going to sometimes give wrong answers.
 > 
 > So I think it is dangerous to try.  The current output
 > may be ugly, but it is always consistent and explainable.

It does seem like a big problem, especially as we add in other RAID
levels etc.  However on the flip side, the accounting of the "used"
space does seem off and maybe fixable?

In other words if I create a btrfs filesystem out of two 1GB devices
with RAID1 for data and metadata, then df shows a total size of 2GB for
the filesystem.  But if I then create a .5 GB file on that filesystem,
the used space is shown as .5 GB only -- ie the accounting of total size
is at the device/block level, but the accounting of used space is at the
logical/filesystem level.  Which leads to very confusing df output.

I wonder if it''s possible to come up with a way to make things
consistent at least, or figure out a way to define more useful
information about space left on the filesystem.

 - Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2009-Nov-19 14:56 UTC

head link

Re: UI issues around RAID1

On Wed, Nov 18, 2009 at 09:59:24AM -0800, Roland Dreier
wrote:> 
>  > > Yeah df is just a fun ball of wax in many respects.  We
don''t take into account
>  > > RAID and we don''t subtrace space thats strictly for
metadata, so there are
>  > > several things that need to be fixed for df.  Thanks,
> 
>  > But as we have said many times... if we have different
>  > raid types active on different files, any attempt to make
>  > df report "raid adjusted numbers" instead of the current
raw
>  > total storage numbers is going to sometimes give wrong answers.
>  > 
>  > So I think it is dangerous to try.  The current output
>  > may be ugly, but it is always consistent and explainable.
> 
> It does seem like a big problem, especially as we add in other RAID
> levels etc.  However on the flip side, the accounting of the
"used"
> space does seem off and maybe fixable?
> 
> In other words if I create a btrfs filesystem out of two 1GB devices
> with RAID1 for data and metadata, then df shows a total size of 2GB for
> the filesystem.  But if I then create a .5 GB file on that filesystem,
> the used space is shown as .5 GB only -- ie the accounting of total size
> is at the device/block level, but the accounting of used space is at the
> logical/filesystem level.  Which leads to very confusing df output.
> 
> I wonder if it''s possible to come up with a way to make things
> consistent at least, or figure out a way to define more useful
> information about space left on the filesystem.
That part we can at least do.  Since we know the amount of space used in
each block group and the raid level of each block group, we can figure
it out.  It won''t be cheap overall but it is at least possible.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Nov 2009 - UI issues around RAID1

UI issues around RAID1

Re: UI issues around RAID1

Re: UI issues around RAID1

Re: UI issues around RAID1

Re: UI issues around RAID1

Re: UI issues around RAID1

Re: UI issues around RAID1

Re: UI issues around RAID1