As far as I understand btrfs stores all data in huge chunks that are striped, mirrored or "raid5/6''ed" throughout all the disks added to the filesystem/volume. How does btrfs deal with different sized disks? let''s say that you for example have 10 different disks that are 100GB,200GB,300GB...1000GB and you create a btrfs filesystem with all the disks. How will the raid5 implementation distribute chunks in such a setup. I assume the stripe+stripe+parity are separate chunks that are placed on separate disks but how does btrfs select the best disk to store a chunk on? In short will a slow disk slow down the entire "array", parts of it or will btrfs attempt to use the fastest disks first? Also since btrfs checksums both data and metadata I am thinking that at least the raid6 implementation perhaps can (try to) reconstruct corrupt data (and try to rewrite it) before reading an alternate copy. Can someone please fill me in on the details here? Finaly how does btrfs deals with advanced format (4k sectors) drives when the entire drive (and not a partition) is used to build a btrfs filesystem. Is proper alignment achieved? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hugo Mills
2012-Jul-01 12:27 UTC
Re: Btrfs RAID space utilization and bitrot reconstruction
On Sun, Jul 01, 2012 at 01:50:39PM +0200, Waxhead wrote:> As far as I understand btrfs stores all data in huge chunks that are > striped, mirrored or "raid5/6''ed" throughout all the disks added to > the filesystem/volume.Well, RAID-5/6 hasn''t landed yet, but yes.> How does btrfs deal with different sized disks? let''s say that you > for example have 10 different disks that are > 100GB,200GB,300GB...1000GB and you create a btrfs filesystem with > all the disks. How will the raid5 implementation distribute chunks > in such a setup.We haven''t seen the code for that bit yet.> I assume the stripe+stripe+parity are separate chunks that are > placed on separate disks but how does btrfs select the best disk to > store a chunk on? In short will a slow disk slow down the entire > "array", parts of it or will btrfs attempt to use the fastest disks > first?Chunks are allocated by ordering the devices by the amount of free (=unallocated) space left on each, and picking the chunks from devices in that order. For RAID-1 chunks are picked in pairs. For RAID-0, "as many as possible" are picked, down to a minimum of 2 (I think). For RAID-10, the largest even number possible is picked, down to a minimum of 4. I _believe_ that RAID-5 and -6 will pick as many as possible, down to some minimum -- but as I said, we haven''t seen the code yet.> Also since btrfs checksums both data and metadata I am thinking that > at least the raid6 implementation perhaps can (try to) reconstruct > corrupt data (and try to rewrite it) before reading an alternate > copy. Can someone please fill me in on the details here?Yes, it should be possible to do that with RAID-5 as well. (Read the data stripes, verify checksums, if one fails, read the parity, verify that, and reconstruct the bad block from the known-good data).> Finaly how does btrfs deals with advanced format (4k sectors) drives > when the entire drive (and not a partition) is used to build a btrfs > filesystem. Is proper alignment achieved?I don''t know about that. However, the native block size in btrfs is 4k, so I''d imagine that it''s all good. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You stay in the theatre because you''re afraid of having no --- money? There''s irony...
Martin Steigerwald
2012-Jul-02 18:00 UTC
Re: Btrfs RAID space utilization and bitrot reconstruction
Am Sonntag, 1. Juli 2012 schrieb Waxhead:> As far as I understand btrfs stores all data in huge chunks that are > striped, mirrored or "raid5/6''ed" throughout all the disks added to > the filesystem/volume.Not through all disks. At least not with the current RAID-1 implementation. It stores two copies of a chunk, no matter how many drives you use. Rest see Hugo´s answer. -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html