Fabian Zeindl
2012-Jan-05 09:21 UTC
Why does Btrfs allow raid1 with mismatched drives? Also: How to look behind the curtain
Hi, the subject is pretty selfexplanatory. I"m creating a btrfs using sudo mkfs.btrfs -m raid1 -d raid1 <smalldisk> <largedisk> it creates the fs, apparently with the size of the larger disk, no matter in which order i supply the disk-arguments. How can this be correct? Is there some way like "cat /proc/mdstat" to see what btrfs is doing and to assure myself my raid1 is secure? It''s not terribly important data, hence i''m trying btrfs, but i don''t want to lose it either. I posted this question on stackexchange as well: http://unix.stackexchange.com/questions/28357/why-does-btrfs-allow-to-create-a-raid1-with-mismatched-drives Please CC me in any replies. Regards Fabian Zeindl-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fabian Zeindl
2012-Jan-05 09:43 UTC
Re: Why does Btrfs allow raid1 with mismatched drives? Also: How to look behind the curtain
On Jan 5, 2012, at 10:21 , Fabian Zeindl wrote:> it creates the fs, apparently with the size of the larger disk, no matter in which order i supply the disk-arguments. > How can this be correct?Edit: wrong observation here. The fs is created with the sum of the sizes of the two disks, though btrfs fi df shows RAID1 for metadata, system and data. Fabian Zeindl -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hugo Mills
2012-Jan-05 09:44 UTC
Re: Why does Btrfs allow raid1 with mismatched drives? Also: How to look behind the curtain
On Thu, Jan 05, 2012 at 10:21:57AM +0100, Fabian Zeindl wrote:> Hi, > > the subject is pretty selfexplanatory. I"m creating a btrfs using > > sudo mkfs.btrfs -m raid1 -d raid1 <smalldisk> <largedisk> > > it creates the fs, apparently with the size of the larger disk, no matter in which order i supply the disk-arguments. > How can this be correct?Because btrfs doesn''t actually do "RAID-1" (in the sense that blocks with the same address on the two disks have identical contents). You should probably read the mis-named "Sysadmin''s Guide" on the wiki[1], which explains what btrfs actually does with its replication. You should also probably read the FAQ entries on free space[2], since using plain "df" for btrfs is usually misleading.> Is there some way like "cat /proc/mdstat" to see what btrfs is doing and to assure myself my raid1 is secure? It''s not > terribly important data, hence i''m trying btrfs, but i don''t want to lose it either.You could run a scrub, which will verify all of the data mirrors on the volume, and fix anything that''s not redundant. Hugo. [1] http://btrfs.ipv5.de/index.php?title=SysadminGuide [2] http://btrfs.ipv5.de/index.php?title=FAQ#Why_does_df_show_incorrect_free_space_for_my_RAID_volume.3F -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- vi: The core of evil. ---
Fabian Zeindl
2012-Jan-05 09:53 UTC
Re: Why does Btrfs allow raid1 with mismatched drives? Also: How to look behind the curtain
On Thursday, January 5, 2012 at 10:44 , Hugo Mills wrote:> You should probably read the mis-named "Sysadmin''s Guide" > on the wiki[1], which explains what btrfs actually does with its > replication. > > You should also probably read the FAQ entries on free space[2], > since using plain "df" for btrfs is usually misleading.I read both, but it doesn''t answer my question on how btrfs behaves when it can''t actually do a raid1, because there''s not enough data on an "other" disk for a chunk-copy.> You could run a scrub, which will verify all of the data mirrors on > the volume, and fix anything that''s not redundant.Will this command fail then for example? fabian-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Steigerwald
2012-Jan-05 10:39 UTC
Re: Why does Btrfs allow raid1 with mismatched drives? Also: How to look behind the curtain
Am Donnerstag, 5. Januar 2012 schrieb Fabian Zeindl:> On Thursday, January 5, 2012 at 10:44 , Hugo Mills wrote: > > You should probably read the mis-named "Sysadmin''s Guide" > > on the wiki[1], which explains what btrfs actually does with its > > replication. > > > > You should also probably read the FAQ entries on free space[2], > > since using plain "df" for btrfs is usually misleading. > > I read both, but it doesn''t answer my question on how btrfs behaves > when it can''t actually do a raid1, because there''s not enough data on > an "other" disk for a chunk-copy.From my reading that Sysadmin Guide answers your question: BTRFS with RAID-1 will allocate chunks on two devices:> Btrfs''s "RAID" implementation bears only passing resemblance to > traditional RAID implementations. Instead, btrfs replicates data on a > per-chunk basis. If the filesystem is configured to use "RAID-1", for > example, chunks are allocated in pairs, with each chunk of the pair > being taken from a different block device. Data written to such a chunk > pair will be duplicated across both chunks. > > Stripe-based "RAID" levels (RAID-0, RAID-10) work in a similar way, > allocating as many chunks as can fit across the drives with free space, > and then perform striping of data at a level smaller than a chunk. So, > for a RAID-10 filesystem on 4 disks, data may be stored like this:[… quoted from the Wiki page …] "Allocating as many chunks as can fit across the drives" is also pretty clear to me. So if BTRFS can´t allocate a new chunk on two devices, its full. To me it seems obvious that BTRFS will not break the RAID-1 redundancy guarentee unless a drive fails. Thus when using a RAID-1 with two devices, the smaller one should define the maximum capacity of the device. But when you use a RAID-1 with one 500 GB and two 250 GB drives, BTRFS can replicate each chunk on the 500 GB drive on *one* of the both 250 GB drives. Thus is makes perfect sense to support differently sized drives in a BTRFS pool. My own observations with a RAID-10 across 4 devices support this. I echo´d "1" > /sys/block/sdX/delete to remove one harddisk while a dd was running to the RAID. BTRFS used the remaining disks. On next reboot all disks where available again. While BTRFS didn´t start rebalancing the RAID automatically a btrfs filesystem balance made it fill up the previously failed device until all devices had the same usage. This is also described in the sysadmin guide: So this is what you have to care for manually. If a drive failed, you have to balance the filesystem so that it creates replicas where they are missing. Now anyone deeper into BTRFS please check at whether my understanding matches what BTRFS is doing…> > You could run a scrub, which will verify all of the data mirrors on > > the volume, and fix anything that''s not redundant. > > Will this command fail then for example?No, unless more than the allowed number of disks are failing. -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fabian Zeindl
2012-Jan-05 12:26 UTC
Re: Why does Btrfs allow raid1 with mismatched drives? Also: How to look behind the curtain
On Jan 5, 2012, at 11:39 , Martin Steigerwald wrote:> "Allocating as many chunks as can fit across the drives" is also pretty > clear to me. So if BTRFS can´t allocate a new chunk on two devices, its > full. To me it seems obvious that BTRFS will not break the RAID-1 > redundancy guarentee unless a drive fails.So (assuming 1GB chunksize): if i create a raid-1, btrfs with a 3GB and a 7GB device, it will show me ~10GB free space, after saving a 1GB file, i will have 8GB left (-1GB on each device) after saving another 1GB, i will have 6GB left (--- " ----) after saving another 1GB, it''s "suddenly" full? Fabian-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Steigerwald
2012-Jan-05 13:01 UTC
Re: Why does Btrfs allow raid1 with mismatched drives? Also: How to look behind the curtain
Am Donnerstag, 5. Januar 2012 schrieb Fabian Zeindl:> On Jan 5, 2012, at 11:39 , Martin Steigerwald wrote: > > "Allocating as many chunks as can fit across the drives" is also > > pretty clear to me. So if BTRFS can´t allocate a new chunk on two > > devices, its full. To me it seems obvious that BTRFS will not break > > the RAID-1 redundancy guarentee unless a drive fails. > > So (assuming 1GB chunksize): > > if i create a raid-1, btrfs with a 3GB and a 7GB device, it will show > me ~10GB free space, after saving a 1GB file, i will have 8GB left > (-1GB on each device) after saving another 1GB, i will have 6GB left > (--- " ----) > after saving another 1GB, it''s "suddenly" full?I would say yes, but suggest that you try this out or wait for confirmation of a BTRFS developer if you can to be sure about this. The other way of handling this would be to break the RAID-1 redundancy guarentee and I really hope that BTRFS is not doing this. I am not completely sure tough as I never tested it. The output of df -h with BTRFS and RAID is bogus anyway. Just consider df -h with two 10GB disks. df -H will display about 20GB free then. But when you write 100 MB it will show that 200 MB are allocated. So an application that assumes it will be able to write 12 GB easily will just fail doing that. I don´t like this either, cause an application that writes something cannot even do a rough estimate. But then an application can never now whether a write will succeed cause another application could also write lots of data in the same time. But even without RAID I cannot get exaxt figures from df. Just consider: merkaba:~> btrfs filesystem show failed to read /dev/sr0 Label: ''debian'' uuid: dd52fea8-f6c3-4a60-bd4a-7650483655e5 Total devices 1 FS bytes used 11.35GB devid 1 size 18.62GB used 18.29GB path /dev/dm-0 Btrfs Btrfs v0.19 merkaba:~> btrfs filesystem df / Data: total=14.01GB, used=10.55GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=2.12GB, used=814.32MB Metadata: total=8.00MB, used=0.00 merkaba:~> df -hT / Dateisystem Typ Größe Benutzt Verf. Verw% Eingehängt auf /dev/mapper/merkaba-debian btrfs 19G 13G 3,8G 77% / merkaba:~> df -HT / Dateisystem Typ Größe Benutzt Verf. Verw% Eingehängt auf /dev/mapper/merkaba-debian btrfs 20G 14G 4,1G 77% / merkaba:~> So how much space is free? df just seems to reflect the size of the data, system and metadata b-trees, not their usage. Cause at least for me 10.55 GB, 4 KB and 814 KB just do not add up to 13 / 14 GB - I am not sure whether btrfs command uses 1024 or 1000 as base. So actually in this case I just be able to write more to the filesystem than df -hT tells me. I prefer this over the other way around with RAID-1 where I just can write about half of the size that df reports. So or so the current df output is bogus. -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Roman Kapusta
2012-Jan-05 13:35 UTC
Re: Why does Btrfs allow raid1 with mismatched drives? Also: How to look behind the curtain
On Thu, Jan 5, 2012 at 13:26, Fabian Zeindl <fabian.zeindl@gmail.com> wrote:> > On Jan 5, 2012, at 11:39 , Martin Steigerwald wrote: >> "Allocating as many chunks as can fit across the drives" is also pretty >> clear to me. So if BTRFS can´t allocate a new chunk on two devices, its >> full. To me it seems obvious that BTRFS will not break the RAID-1 >> redundancy guarentee unless a drive fails. > > So (assuming 1GB chunksize): > > if i create a raid-1, btrfs with a 3GB and a 7GB device, it will show me ~10GB free space, > after saving a 1GB file, i will have 8GB left (-1GB on each device) > after saving another 1GB, i will have 6GB left (--- " ----) > after saving another 1GB, it''s "suddenly" full?you have still 4GB free of non RAID-1 (single) space, which is currently unavailable, but it is planned that BTRFS will support mixed storage: some files can be RAID-1, some files can be RAID-0 and rest is basic (single) storage> > Fabian-- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fabian Zeindl
2012-Jan-05 13:47 UTC
Re: Why does Btrfs allow raid1 with mismatched drives? Also: How to look behind the curtain
On Jan 5, 2012, at 14:35 , Roman Kapusta wrote:> you have still 4GB free of non RAID-1 (single) space, which is > currently unavailable, but it is planned that BTRFS will support mixed > storage: > some files can be RAID-1, some files can be RAID-0 and rest is basic > (single) storageUnderstood. So to clarify things i think it would be good if btrfs could print out more detailled information. Available raw space: 10GB 7G on drive A 3G on drive B Assignable space for raid1: 3GB 3G on drive A 3G on drive B Or maybe the other way round: show which different "raid configurations" there are and how the use which space. I understand that "free space" is a difficult concept, if you do per-file or per-chunk redundancy, but i think there are a lot of users out there who just want to do a "standard" replication with their whole disk. Maybe with the special ability of the 2x500G +1TB, which mdadm, AFAIK, can''t do. This would be just a subset of what btrfs can do, of course, but it''s a frequently used subset, so maybe there could be some kind of saved "profile" on how the user "intends" to use the filesystem. Output could then be clarified using that profile and it could also give warnings or prevent actions that make no sense. fabian-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Steigerwald
2012-Jan-05 14:40 UTC
Re: Why does Btrfs allow raid1 with mismatched drives? Also: How to look behind the curtain
Am Donnerstag, 5. Januar 2012 schrieb Fabian Zeindl:> On Jan 5, 2012, at 14:35 , Roman Kapusta wrote: > > you have still 4GB free of non RAID-1 (single) space, which is > > currently unavailable, but it is planned that BTRFS will support > > mixed storage: > > some files can be RAID-1, some files can be RAID-0 and rest is basic > > (single) storage > > Understood. So to clarify things i think it would be good if btrfs > could print out more detailled information. > > Available raw space: 10GB > 7G on drive A > 3G on drive B > Assignable space for raid1: 3GB > 3G on drive A > 3G on drive B > > Or maybe the other way round: show which different "raid > configurations" there are and how the use which space.As far as I see these informations can already be derived from btrfs filesystem df / show by combining values together. But it involves some manual calculations.> I understand that "free space" is a difficult concept, if you do > per-file or per-chunk redundancy, but i think there are a lot of users > out there who just want to do a "standard" replication with their > whole disk. Maybe with the special ability of the 2x500G +1TB, which > mdadm, AFAIK, can''t do.It should be able to do that if you concatenate the two 2x500 GB via device mapper or the LVM layer above it.> This would be just a subset of what btrfs can do, of course, but it''s a > frequently used subset, so maybe there could be some kind of saved > "profile" on how the user "intends" to use the filesystem. Output > could then be clarified using that profile and it could also give > warnings or prevent actions that make no sense.That makes sense to me. As a default I would say that -d raid1 and -m raid1 just creates a RAID-1. And then BTRFS should put out the usable space for that RAID-1. I.e. when I have two 500 GB disks with 100 GB allocated it should return about 400 GB free space. And when it uses one 1 TB disk as well as one 500 GB disk and two 250 GB disk with 100 GB allocated, it should return about 900GB free space. Only when it has one disk with 500 GB and one with 1 TB with 100 GB allocated, it should return 400 GB free space. IMHO this also should be what BTRFS reports to the regular df command. This should only change if a mixed policy is in place. I do not know what to report to the OS then. Should it add the RAID-1 and the RAID-0 space? Should it only report the RAID-1 space? IMHO that depends on the allocation policy. If new files are enforced to be on RAID-1 space it should do the latter and if new files are created on RAID-0 space if RAID-1 space is full it should report the former. For more details, btrfs filesystem df / show still need to be used. Maybe with a revised and even more informative output like you suggested. -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Steigerwald
2012-Jan-05 14:41 UTC
Re: Why does Btrfs allow raid1 with mismatched drives? Also: How to look behind the curtain
Sorry, accidentally dropped CC. Am Donnerstag, 5. Januar 2012 schrieb Fabian Zeindl:> On Jan 5, 2012, at 14:35 , Roman Kapusta wrote: > > you have still 4GB free of non RAID-1 (single) space, which is > > currently unavailable, but it is planned that BTRFS will support > > mixed storage: > > some files can be RAID-1, some files can be RAID-0 and rest is basic > > (single) storage > > Understood. So to clarify things i think it would be good if btrfs > could print out more detailled information. > > Available raw space: 10GB > 7G on drive A > 3G on drive B > Assignable space for raid1: 3GB > 3G on drive A > 3G on drive B > > Or maybe the other way round: show which different "raid > configurations" there are and how the use which space.As far as I see these informations can already be derived from btrfs filesystem df / show by combining values together. But it involves some manual calculations.> I understand that "free space" is a difficult concept, if you do > per-file or per-chunk redundancy, but i think there are a lot of users > out there who just want to do a "standard" replication with their > whole disk. Maybe with the special ability of the 2x500G +1TB, which > mdadm, AFAIK, can''t do.It should be able to do that if you concatenate the two 2x500 GB via device mapper or the LVM layer above it.> This would be just a subset of what btrfs can do, of course, but it''s a > frequently used subset, so maybe there could be some kind of saved > "profile" on how the user "intends" to use the filesystem. Output > could then be clarified using that profile and it could also give > warnings or prevent actions that make no sense.That makes sense to me. As a default I would say that -d raid1 and -m raid1 just creates a RAID-1. And then BTRFS should put out the usable space for that RAID-1. I.e. when I have two 500 GB disks with 100 GB allocated it should return about 400 GB free space. And when it uses one 1 TB disk as well as one 500 GB disk and two 250 GB disk with 100 GB allocated, it should return about 900GB free space. Only when it has one disk with 500 GB and one with 1 TB with 100 GB allocated, it should return 400 GB free space. IMHO this also should be what BTRFS reports to the regular df command. This should only change if a mixed policy is in place. I do not know what to report to the OS then. Should it add the RAID-1 and the RAID-0 space? Should it only report the RAID-1 space? IMHO that depends on the allocation policy. If new files are enforced to be on RAID-1 space it should do the latter and if new files are created on RAID-0 space if RAID-1 space is full it should report the former. For more details, btrfs filesystem df / show still need to be used. Maybe with a revised and even more informative output like you suggested. -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html