thr3ads.net - Btrfs devel - [Discussion] Extent Block Group allocations [Jan 2009]

If this information is useful, please help other people find it:
Share via:

ashford@whisperpc.com

2009-Jan-20 20:54 UTC

[Discussion] Extent Block Group allocations

Hi all,

I searched the archives, and didn''t find any answers to my questions,
so I
think it''s time to ask.

From:  http://btrfs.wiki.kernel.org/index.php/Btrfs_design#Extent_Block_Groups

        Block groups have a flag that indicate if they are preferred for data
        or metadata allocations, and at mkfs time the disk is broken up into
        alternating metadata (33% of the disk) and data groups (66% of the
        disk). As the disk fills, a group''s preference may change back
and
        forth, but Btrfs always tries to avoid intermixing data and metadata
        extents in the same group. This substantially improves fsck throughput,
        and reduces seeks during writeback while the FS is mounted. It does
        slightly increase the seeks while reading.

Based on this, it appears that there is a semi-fixed allocation of 33% of the
disk to metadata, but that this allocation can change dynamically as the disk
fills.  It would appear that if the metadata approaches/exceeds its
allocation, a data group will be reallocated to it, and the same with the data
(an extent group would be reallocated).

At the present, there is only one logical device per file-system (single,
RAID-0, RAID-1 or RAID-10 - each is one logical device).  Based on the
documentation, there appears to be an intent to support RAID-6 (and optionally
RAID-5 - I believe this would be good) as logical devices.

From what I see in the Multiple Device Support page
(http://btrfs.wiki.kernel.org/index.php/Multiple_Device_Support), it appears
that the intent in the future is to allow a BTRFS file-system to reside on
multiple logical devices.  This is the starting point for my questions.

In an installation where a large number of physical devices are available for
use (something like a Sun Thumper - 48 total disks, or a server connected to a
SAN), the optimum configuration might be to dedicate certain logical devices
(small/fast disks in RAID-1) to metadata, and other devices (large/slow disks
in RAID-5 or RAID-6) to data.  To perform this, the metadata allocation
percentage would need to be tunable (0% for data-only and 100% for
metadata-only), and it would have to be able to be locked, so that the block
group reallocation between metadata and data would be disabled (another option
might be to allow metadata to reallocate data block groups, but not the other
way around).

I believe that a configuration like this would be more flexible than having
the metadata block groups interleaved with the data block groups.  I also
believe that this should be able to provide better overall response and
throughput on a large multi-user server.

Is something like this intended to be possible?

Thank you.

Peter Ashford

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2009-Jan-20 23:07 UTC

head link

Re: [Discussion] Extent Block Group allocations

On Tue, 2009-01-20 at 12:54 -0800, ashford@whisperpc.com
wrote:> Hi all,
> 
> I searched the archives, and didn''t find any answers to my
questions, so I
> think it''s time to ask.
> 
> From: 
http://btrfs.wiki.kernel.org/index.php/Btrfs_design#Extent_Block_Groups
> 
>         Block groups have a flag that indicate if they are preferred for
data
>         or metadata allocations, and at mkfs time the disk is broken up
into
>         alternating metadata (33% of the disk) and data groups (66% of the
>         disk). As the disk fills, a group''s preference may change
back and
>         forth, but Btrfs always tries to avoid intermixing data and
metadata
>         extents in the same group. This substantially improves fsck
throughput,
>         and reduces seeks during writeback while the FS is mounted. It does
>         slightly increase the seeks while reading.
> 
I missed this when I last updated the design doc.  It is much more
flexible now.  Chunks of storage are allocated from each device for use
as data or metadata as required.
> Based on this, it appears that there is a semi-fixed allocation of 33% of
the
> disk to metadata, but that this allocation can change dynamically as the
disk
> fills.  It would appear that if the metadata approaches/exceeds its
> allocation, a data group will be reallocated to it, and the same with the
data
> (an extent group would be reallocated).
> 
> At the present, there is only one logical device per file-system (single,
> RAID-0, RAID-1 or RAID-10 - each is one logical device).  Based on the
> documentation, there appears to be an intent to support RAID-6 (and
optionally
> RAID-5 - I believe this would be good) as logical devices.
> 
There is one logical address space per FS right now.  Each device in the
FS can contribute to the logical address space.
> >From what I see in the Multiple Device Support page
> (http://btrfs.wiki.kernel.org/index.php/Multiple_Device_Support), it
appears
> that the intent in the future is to allow a BTRFS file-system to reside on
> multiple logical devices.  This is the starting point for my questions.
> 
> In an installation where a large number of physical devices are available
for
> use (something like a Sun Thumper - 48 total disks, or a server connected
to a
> SAN), the optimum configuration might be to dedicate certain logical
devices
> (small/fast disks in RAID-1) to metadata, and other devices (large/slow
disks
> in RAID-5 or RAID-6) to data.  To perform this, the metadata allocation
> percentage would need to be tunable (0% for data-only and 100% for
> metadata-only), and it would have to be able to be locked, so that the
block
> group reallocation between metadata and data would be disabled (another
option
> might be to allow metadata to reallocate data block groups, but not the
other
> way around).
> 
Yes, we definitely want to be able to tie metadata or data to specific
drives.  The disk format has what it needs for this, but it hasn''t been
coded up yet.
> I believe that a configuration like this would be more flexible than having
> the metadata block groups interleaved with the data block groups.  I also
> believe that this should be able to provide better overall response and
> throughput on a large multi-user server.
> 
> Is something like this intended to be possible?
Definitely ;)  Thanks for these comments.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

ashford@whisperpc.com

2009-Jan-22 18:20 UTC

head link

[PATCH] Add validation for sector size

In mkfs.btrfs, the sector size must be a power of two for the second half of
the leafsize and nodesize checks to work, but sectorsize is never validated.


# diff -u mkfs.c- mkfs.c
--- mkfs.c-     2009-01-20 11:37:39.000000000 -0800
+++ mkfs.c      2009-01-22 10:13:49.000000000 -0800
@@ -391,14 +391,22 @@
                }
        }
        sectorsize = max(sectorsize, (u32)getpagesize());
+       if ((sectorsize & (sectorsize - 1))) {
+               fprintf(stderr, "Sector size %u must be a power of
2\n",
+                       sectorsize);
+               exit(1);
+       }
+
        if (leafsize < sectorsize || (leafsize & (sectorsize - 1))) {
                fprintf(stderr, "Illegal leafsize %u\n", leafsize);
                exit(1);
        }
+
        if (nodesize < sectorsize || (nodesize & (sectorsize - 1))) {
                fprintf(stderr, "Illegal nodesize %u\n", nodesize);
                exit(1);
        }
+
        ac = ac - optind;
        if (ac == 0)
                print_usage();
#

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jan 2009 - [Discussion] Extent Block Group allocations

[Discussion] Extent Block Group allocations

Re: [Discussion] Extent Block Group allocations

[PATCH] Add validation for sector size