I''ve just started playing around with btrfs RAID1, and I''ve noticed a couple of what seem to be UI issues. Suppose I do something like "mkfs.btrfs -d raid1 -m raid1 dev1 dev2". I see the following minor usability problems: - Unless I''m missing something, there doesn''t seem to be any way later on to see that I set the data policy to raid1, except using btrfs-dump-tree and checking the flags bits for the appropriate group. Which can make things confusing if I have a bunch of btrfs filesystems around. - The free space reporting doesn''t seem to take into account the fact that everything is going to be mirrored; so "df" et al report the size of the filesystem and free space on the new filesystem as size(dev1) + size(dev2) -- if dev1 and dev2 are the same size then I would assume it should really be just size(dev1) for a fully-RAID1 filesystem. (Not sure in general what we should say for a metadata-only mirrored filesystem, since we don''t really know in advance how much space we have exactly) I''m happy to help fix these issues up; just want to make sure I''m not missing something or doing it wrong. Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 16, 2009 at 10:45:28AM -0800, Roland Dreier wrote:> I''ve just started playing around with btrfs RAID1, and I''ve noticed a > couple of what seem to be UI issues. Suppose I do something like > "mkfs.btrfs -d raid1 -m raid1 dev1 dev2". I see the following minor > usability problems: > > - Unless I''m missing something, there doesn''t seem to be any way later > on to see that I set the data policy to raid1, except using > btrfs-dump-tree and checking the flags bits for the appropriate > group. Which can make things confusing if I have a bunch of btrfs > filesystems around. >You aren''t missing anything, theres just nothing that spits that information out yet. btrfs-show would probably be a good place to do this.> - The free space reporting doesn''t seem to take into account the fact > that everything is going to be mirrored; so "df" et al report the > size of the filesystem and free space on the new filesystem as > size(dev1) + size(dev2) -- if dev1 and dev2 are the same size then I > would assume it should really be just size(dev1) for a fully-RAID1 > filesystem. (Not sure in general what we should say for a > metadata-only mirrored filesystem, since we don''t really know in > advance how much space we have exactly) >Yeah df is just a fun ball of wax in many respects. We don''t take into account RAID and we don''t subtrace space thats strictly for metadata, so there are several things that need to be fixed for df. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josef Bacik wrote:> On Mon, Nov 16, 2009 at 10:45:28AM -0800, Roland Dreier wrote: >> - The free space reporting doesn''t seem to take into account the fact >> that everything is going to be mirrored; so "df" et al report the >> size of the filesystem and free space on the new filesystem as >> size(dev1) + size(dev2) -- if dev1 and dev2 are the same size then I >> would assume it should really be just size(dev1) for a fully-RAID1 >> filesystem. (Not sure in general what we should say for a >> metadata-only mirrored filesystem, since we don''t really know in >> advance how much space we have exactly) >> > > Yeah df is just a fun ball of wax in many respects. We don''t take into account > RAID and we don''t subtrace space thats strictly for metadata, so there are > several things that need to be fixed for df. Thanks,But as we have said many times... if we have different raid types active on different files, any attempt to make df report "raid adjusted numbers" instead of the current raw total storage numbers is going to sometimes give wrong answers. So I think it is dangerous to try. The current output may be ugly, but it is always consistent and explainable. jim -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andrey Kuzmin wrote:> On Tue, Nov 17, 2009 at 12:48 AM, jim owens <jowens@hp.com> wrote: >> But as we have said many times... if we have different >> raid types active on different files, any attempt to make > > Late question, but could you please explain it a bit further (or point > me to respective discussion archive)? Did I get it correct that btrfs > supports per-file raid topology? Or is it per-(sub)volume?The design of btrfs actually allows each extent inside a file to have a different raid type. This probably will never happen unless a file is written, we add disks and mount with a new raid type, and then modify part of the file. (this may not behave how I think but I plan to test it someday soon). There is a flag on the file to allow per-file raid setting via ioctl/fcntl. The typical use for this would be to make a file DUPlicate type on a simple disk. DUPlicate acts like a raid 1 mirror on a single drive and is the default raid type for metadata extents. [disclaimer] btrfs is still in development and Chris might say it does not (or will not in the future) work like I think.>> df report "raid adjusted numbers" instead of the current raw >> total storage numbers is going to sometimes give wrong answers. > > I have always thought that space (both physical and logical) used by > file-system could be accounted for correctly whatever topology or a > mixture thereof is in effect, the only point worth discussion being > accounting overhead. Free space, under variable topology, of course > can only be reliably reported as raw (or as an ''if you use this > topology,-then you have this logical capacity left'' list).So we know the "raw free blocks", but can not guarantee "how many raw blocks per new user write-block" will be consumed because we do not know what topology will be in effect for a new write. We could cheat and use "worst-case topology" numbers if all writes are the current default raid. Of course this ignores DUP unless it is set on the whole filesystem. And we also have the problem of metadata - which is dynamic and allocated in large chunks and has a DUP type, how do we account for that in worst-case calculations. The worst-case is probably wrong but may be more useful to people to know when they will run out of space. Or at least it might make some of our ENOSPC complaints go away :) Only "raw" and "worst-case" can be explained to users and which we report is up to Chris. Today we report "raw". After spending 10 years on a multi-volume filesystem that had (unsolvable) confusing df output, I''m just of the opinion that nothing we do will make everyone happy. But feel free to run a patch proposal by Chris. jim -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 17, 2009 at 6:25 PM, jim owens <jowens@hp.com> wrote:> <snip> > So we know the "raw free blocks", but can not guarantee > "how many raw blocks per new user write-block" will be > consumed because we do not know what topology will be > in effect for a new write. > > We could cheat and use "worst-case topology" numbers > if all writes are the current default raid. Of course > this ignores DUP unless it is set on the whole filesystem. > > And we also have the problem of metadata - which is dynamic > and allocated in large chunks and has a DUP type, how do we > account for that in worst-case calculations. > > The worst-case is probably wrong but may be more useful to > people to know when they will run out of space. Or at least > it might make some of our ENOSPC complaints go away :) > > Only "raw" and "worst-case" can be explained to users and > which we report is up to Chris. Today we report "raw". > > After spending 10 years on a multi-volume filesystem that > had (unsolvable) confusing df output, I''m just of the > opinion that nothing we do will make everyone happy.df is user-centric, and therefore is naturally expected to return used/available _logical_ capacity (how this translates to used physical space is up to file-system-specific tools to find out/report). Returning raw is counter-intuitive and causes surprise similar to that of Roland. With so flexible, down to per-file, topology configuration the only option I see for df to return logical capacity available is to compute the latter off the file-system object for which df is invoked. For instance, ''df /path/to/some/file'' could return logical capacity for the mountpoint where some-file resides, computed from underlying physical capacity available _and_ topology for this file. ''df /mount-point'' would under this implementation return available logical capacity assuming default topology for the referenced file-system. As to used logical space accounting, this is file-system-specific and I''m not yet familiar enough with btrfs code-base to argument for any approach. Regards, Andrey> > But feel free to run a patch proposal by Chris. > > jim >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > - Unless I''m missing something, there doesn''t seem to be any way later> > on to see that I set the data policy to raid1, except using > > btrfs-dump-tree and checking the flags bits for the appropriate > > group. Which can make things confusing if I have a bunch of btrfs > > filesystems around. > You aren''t missing anything, theres just nothing that spits that information out > yet. btrfs-show would probably be a good place to do this. Thanks. I''ll look at adding in show more info about RAID policy to the btrfs-show output. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Yeah df is just a fun ball of wax in many respects. We don''t take into account> > RAID and we don''t subtrace space thats strictly for metadata, so there are > > several things that need to be fixed for df. Thanks, > But as we have said many times... if we have different > raid types active on different files, any attempt to make > df report "raid adjusted numbers" instead of the current raw > total storage numbers is going to sometimes give wrong answers. > > So I think it is dangerous to try. The current output > may be ugly, but it is always consistent and explainable. It does seem like a big problem, especially as we add in other RAID levels etc. However on the flip side, the accounting of the "used" space does seem off and maybe fixable? In other words if I create a btrfs filesystem out of two 1GB devices with RAID1 for data and metadata, then df shows a total size of 2GB for the filesystem. But if I then create a .5 GB file on that filesystem, the used space is shown as .5 GB only -- ie the accounting of total size is at the device/block level, but the accounting of used space is at the logical/filesystem level. Which leads to very confusing df output. I wonder if it''s possible to come up with a way to make things consistent at least, or figure out a way to define more useful information about space left on the filesystem. - Roland -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 18, 2009 at 09:59:24AM -0800, Roland Dreier wrote:> > > > Yeah df is just a fun ball of wax in many respects. We don''t take into account > > > RAID and we don''t subtrace space thats strictly for metadata, so there are > > > several things that need to be fixed for df. Thanks, > > > But as we have said many times... if we have different > > raid types active on different files, any attempt to make > > df report "raid adjusted numbers" instead of the current raw > > total storage numbers is going to sometimes give wrong answers. > > > > So I think it is dangerous to try. The current output > > may be ugly, but it is always consistent and explainable. > > It does seem like a big problem, especially as we add in other RAID > levels etc. However on the flip side, the accounting of the "used" > space does seem off and maybe fixable? > > In other words if I create a btrfs filesystem out of two 1GB devices > with RAID1 for data and metadata, then df shows a total size of 2GB for > the filesystem. But if I then create a .5 GB file on that filesystem, > the used space is shown as .5 GB only -- ie the accounting of total size > is at the device/block level, but the accounting of used space is at the > logical/filesystem level. Which leads to very confusing df output. > > I wonder if it''s possible to come up with a way to make things > consistent at least, or figure out a way to define more useful > information about space left on the filesystem.That part we can at least do. Since we know the amount of space used in each block group and the raid level of each block group, we can figure it out. It won''t be cheap overall but it is at least possible. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html