Yet another boot loader support request. Right now btrfs'' definition of "RAID-1" with more than two devices is a bit unorthodox: it stores on any two drives. "True RAID-1" would instead store N copies on each of N devices, the same way an actual RAID-1 would operate with an arbitrary number of devices. This means that a bootloader can consider a single device in isolation: if the firmware gives access only to a single device, it can be booted. Since /boot is usually a very small amount of data, this is a very reasonable tradeoff. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 20, 2012 at 12:27 PM, H. Peter Anvin <hpa@zytor.com> wrote:> Yet another boot loader support request. > > Right now btrfs'' definition of "RAID-1" with more than two devices is a > bit unorthodox: it stores on any two drives. "True RAID-1" would > instead store N copies on each of N devices, the same way an actual > RAID-1 would operate with an arbitrary number of devices. > > This means that a bootloader can consider a single device in isolation: > if the firmware gives access only to a single device, it can be booted. > Since /boot is usually a very small amount of data, this is a very > reasonable tradeoff.+1 In fact, the current RAID-1 should not have been called RAID-1 at all, it is confusing. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 20, 2012 at 06:35:30PM -0600, Marios Titas wrote:> On Wed, Jun 20, 2012 at 12:27 PM, H. Peter Anvin <hpa@zytor.com> wrote: > > Yet another boot loader support request. > > > > Right now btrfs'' definition of "RAID-1" with more than two devices is a > > bit unorthodox: it stores on any two drives. "True RAID-1" would > > instead store N copies on each of N devices, the same way an actual > > RAID-1 would operate with an arbitrary number of devices. > > > > This means that a bootloader can consider a single device in isolation: > > if the firmware gives access only to a single device, it can be booted. > > Since /boot is usually a very small amount of data, this is a very > > reasonable tradeoff. > > +1 > > In fact, the current RAID-1 should not have been called RAID-1 at all, > it is confusing.With the raid5/6 code, I''m changing raid1 (and raid10) to have a configurable number of copies. So, you''ll be able to have N copies on M drives, where N <= M. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Could you have a mode, though, where M = N at all times, so a user doesn''t end up adding a new drive and get a nasty surprise? Chris Mason <chris.mason@fusionio.com> wrote:>On Wed, Jun 20, 2012 at 06:35:30PM -0600, Marios Titas wrote: >> On Wed, Jun 20, 2012 at 12:27 PM, H. Peter Anvin <hpa@zytor.com> >wrote: >> > Yet another boot loader support request. >> > >> > Right now btrfs'' definition of "RAID-1" with more than two devices >is a >> > bit unorthodox: it stores on any two drives. "True RAID-1" would >> > instead store N copies on each of N devices, the same way an actual >> > RAID-1 would operate with an arbitrary number of devices. >> > >> > This means that a bootloader can consider a single device in >isolation: >> > if the firmware gives access only to a single device, it can be >booted. >> > Since /boot is usually a very small amount of data, this is a very >> > reasonable tradeoff. >> >> +1 >> >> In fact, the current RAID-1 should not have been called RAID-1 at >all, >> it is confusing. > >With the raid5/6 code, I''m changing raid1 (and raid10) to have a >configurable number of copies. So, you''ll be able to have N copies on >M >drives, where N <= M. > >-chris-- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Yes and no. If you have 2 drives and you add one more, we can make it do all new chunks over 3 drives. But, turning the existing double mirror chunks into a triple mirror requires a balance. -chris On Wed, Jun 20, 2012 at 07:34:27PM -0600, H. Peter Anvin wrote:> Could you have a mode, though, where M = N at all times, so a user doesn''t end up adding a new drive and get a nasty surprise? > > Chris Mason <chris.mason@fusionio.com> wrote: > > >On Wed, Jun 20, 2012 at 06:35:30PM -0600, Marios Titas wrote: > >> On Wed, Jun 20, 2012 at 12:27 PM, H. Peter Anvin <hpa@zytor.com> > >wrote: > >> > Yet another boot loader support request. > >> > > >> > Right now btrfs'' definition of "RAID-1" with more than two devices > >is a > >> > bit unorthodox: it stores on any two drives. "True RAID-1" would > >> > instead store N copies on each of N devices, the same way an actual > >> > RAID-1 would operate with an arbitrary number of devices. > >> > > >> > This means that a bootloader can consider a single device in > >isolation: > >> > if the firmware gives access only to a single device, it can be > >booted. > >> > Since /boot is usually a very small amount of data, this is a very > >> > reasonable tradeoff. > >> > >> +1 > >> > >> In fact, the current RAID-1 should not have been called RAID-1 at > >all, > >> it is confusing. > > > >With the raid5/6 code, I''m changing raid1 (and raid10) to have a > >configurable number of copies. So, you''ll be able to have N copies on > >M > >drives, where N <= M. > > > >-chris > > -- > Sent from my mobile phone. Please excuse brevity and lack of formatting.-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/25/2012 08:21 AM, Chris Mason wrote:> Yes and no. If you have 2 drives and you add one more, we can make it > do all new chunks over 3 drives. But, turning the existing double > mirror chunks into a triple mirror requires a balance. > > -chrisSo trigger one. This is the exact analogue to the resync pass that is required in classic RAID after adding new media. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 26, 2012 at 3:46 AM, H. Peter Anvin <hpa@zytor.com> wrote:> > On 06/25/2012 08:21 AM, Chris Mason wrote: > > Yes and no. If you have 2 drives and you add one more, we can make it > > do all new chunks over 3 drives. But, turning the existing double > > mirror chunks into a triple mirror requires a balance. > > > > -chris > > So trigger one. This is the exact analogue to the resync pass that is > required in classic RAID after adding new media. > > -hpa > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.htmlTo me one doesn''t have to be triggered, a user expects to have to tell the disks to rebuild/resync/balance after adding a disk, they may want to wait till they''ve added all 4 disks and run a few extra commands before they run the rebalance. What is important is having a mode that doesn''t require the user to remember that what they had used as the closest analogue to RAID1 that BTRFS supports requires them to run another command to change the ''RAID level'' to be the RAID1 analogue for the new number of disks. Users will forget that and they will lose data because of it. At least with a M=N mode BTRFS can say they tried to make it easy to avoid that pitfall. (resend in plain text for mailing list, CC list received the HTML version) -- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gareth@cerberos.id.au - www.rockpaperdynamite.wordpress.com "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/25/2012 03:28 PM, Gareth Pye wrote:> To me one doesn''t have to be triggered, a user expects to have to tell > the disks to rebuild/resync/balance after adding a disk, they may want > to wait till they''ve added all 4 disks and run a few extra commands > before they run the rebalance.They do? E.g. mdadm doesn''t make them...> What is important is having a mode that > doesn''t require the user to remember that what they had used as the > closest analogue to RAID1 that BTRFS supports requires them to run > another command to change the ''RAID level'' to be the RAID1 analogue for > the new number of disks. > > Users will forget that and they will lose data because of it. At least > with a M=N mode BTRFS can say they tried to make it easy to avoid that > pitfall.Doesn''t that contradict your previous statement? In either case, I agree with the latter... -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 26, 2012 at 8:37 AM, H. Peter Anvin <hpa@zytor.com> wrote:> They do? E.g. mdadm doesn''t make them...Hrm, you are right. It is something I always confirm is happening though. Without a M=N mode there would need to be two balances as the first balance would be doing it wrong :( -- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gareth@cerberos.id.au - www.rockpaperdynamite.wordpress.com "Dear God, I would like to file a bug report" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 25, 2012 at 10:46:01AM -0700, H. Peter Anvin wrote:> On 06/25/2012 08:21 AM, Chris Mason wrote: > > Yes and no. If you have 2 drives and you add one more, we can make it > > do all new chunks over 3 drives. But, turning the existing double > > mirror chunks into a triple mirror requires a balance. > > > > -chris > > So trigger one. This is the exact analogue to the resync pass that is > required in classic RAID after adding new media.You''d have to cancel and restart if a second new disk was added while the first balance was ongoing. Fortunately, this isn''t a problem these days. Also, it occurs to me that I should just check -- are you aware that the btrfs implementation of RAID-1 makes no guarantees about the location of any given piece of data? i.e. if I have a piece of data stored at block X on disk 1, it''s not guaranteed to be stored at block X on disks 2, 3, 4, ... I''m not sure if this is important to you, but it''s a significant difference between the btrfs implementation of RAID-1 and the MD implementation. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Never underestimate the bandwidth of a Volvo filled --- with backup tapes.
On 06/25/2012 03:54 PM, Hugo Mills wrote:> On Mon, Jun 25, 2012 at 10:46:01AM -0700, H. Peter Anvin wrote: >> On 06/25/2012 08:21 AM, Chris Mason wrote: >>> Yes and no. If you have 2 drives and you add one more, we can >>> make it do all new chunks over 3 drives. But, turning the >>> existing double mirror chunks into a triple mirror requires a >>> balance. >>> >>> -chris >> >> So trigger one. This is the exact analogue to the resync pass >> that is required in classic RAID after adding new media. > > You''d have to cancel and restart if a second new disk was added > while the first balance was ongoing. Fortunately, this isn''t a > problem these days. > > Also, it occurs to me that I should just check -- are you aware > that the btrfs implementation of RAID-1 makes no guarantees about > the location of any given piece of data? i.e. if I have a piece of > data stored at block X on disk 1, it''s not guaranteed to be stored > at block X on disks 2, 3, 4, ... I''m not sure if this is important > to you, but it''s a significant difference between the btrfs > implementation of RAID-1 and the MD implementation. >I am aware of that, and it is not a problem... the one-device bootloader can find out *which* disk it is talking to by comparing uuids, and the btrfs data structures will tell it how to find the data on that specific disk. It does of course mean the bootloader needs to be aware of the multidisk nature of btrfs, but that isn''t a problem in itself. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/25/2012 04:00 PM, H. Peter Anvin wrote:> > I am aware of that, and it is not a problem... the one-device > bootloader can find out *which* disk it is talking to by comparing > uuids, and the btrfs data structures will tell it how to find the data > on that specific disk. It does of course mean the bootloader needs to > be aware of the multidisk nature of btrfs, but that isn''t a problem in > itself. >So, also, let me address the question why we should care about a one-device bootloader. It is quite common, especially in fileservers, for a subset of the boot devices to be inaccessible by the firmware, due to bugs, boot time concerns (spinning up all the media in the firmware is SLOW) or just plain lack of support of plug-in cards. As such, the reliable thing to do is to make sure that any disk being seen is enough to bring up the system; since this is such a small amount of data with modern standards, there is just no reason to do anything less robust. Once the kernel comes up it has all the device drivers, of course. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html