There is various information about - enterprise-class drives (either SAS or just enterprise SATA) - the SCSI/SAS protocols themselves vs SATA having more advanced features (e.g. for dealing with error conditions) than the average block device For example, Adaptec recommends that such drives will work better with their hardware RAID cards: http://ask.adaptec.com/cgi-bin/adaptec_tic.cfg/php/enduser/std_adp.php?p_faqid=14596 "Desktop class disk drives have an error recovery feature that will result in a continuous retry of the drive (read or write) when an error is encountered, such as a bad sector. In a RAID array this can cause the RAID controller to time-out while waiting for the drive to respond." and this blog: http://www.adaptec.com/blog/?p=901 "major advantages to enterprise drives (TLER for one) ... opt for the enterprise drives in a RAID environment no matter what the cost of the drive over the desktop drive" My question.. - does btrfs RAID1 actively use the more advanced features of these drives, e.g. to work around errors without getting stuck on a bad block? - if a non-RAID SAS card is used, does it matter which card is chosen? Does btrfs work equally well with all of them? - ignoring the better MTBF and seek times of these drives, do any of the other features passively contribute to a better RAID experience when using btrfs? - for someone using SAS or enterprise SATA drives with Linux, I understand btrfs gives the extra benefit of checksums, are there any other specific benefits over using mdadm or dmraid? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 09 of May 2012 22:01:49 Daniel Pocock wrote:> There is various information about > - enterprise-class drives (either SAS or just enterprise SATA) > - the SCSI/SAS protocols themselves vs SATA > having more advanced features (e.g. for dealing with error conditions) > than the average block device > > For example, Adaptec recommends that such drives will work better with > their hardware RAID cards: > > http://ask.adaptec.com/cgi-bin/adaptec_tic.cfg/php/enduser/std_adp.php?p_f > aqid=14596 "Desktop class disk drives have an error recovery feature that > will result in a continuous retry of the drive (read or write) when an > error is encountered, such as a bad sector. In a RAID array this can > cause the RAID controller to time-out while waiting for the drive to > respond." > > and this blog: > http://www.adaptec.com/blog/?p=901 > "major advantages to enterprise drives (TLER for one) ... opt for the > enterprise drives in a RAID environment no matter what the cost of the > drive over the desktop drive" > > My question.. > > - does btrfs RAID1 actively use the more advanced features of these > drives, e.g. to work around errors without getting stuck on a bad block?There are no (short) timeouts that I know of> - if a non-RAID SAS card is used, does it matter which card is chosen? > Does btrfs work equally well with all of them?If you''re using btrfs RAID, you need a HBA, not a RAID card. If the RAID card can work as a HBA (usually labelled as JBOD mode) then you''re good to go. For example, HP CCISS controllers can''t work in JBOD mode. If you''re using the RAID feature of the card, then you need to look at general Linux support, btrfs doesn''t do anything other FS don''t do with the block devices.> - ignoring the better MTBF and seek times of these drives, do any of the > other features passively contribute to a better RAID experience when > using btrfs?whatever they really have high MTBF values is debatable... seek times do matter very much to btrfs, fast CPU is also a good thing to have with btrfs, especially if you want to use data compression, high node or leaf sizes> - for someone using SAS or enterprise SATA drives with Linux, I > understand btrfs gives the extra benefit of checksums, are there any > other specific benefits over using mdadm or dmraid?Because btrfs knows when the drive is misbeheaving (because of checksums) and is returning bad data, it can detect problems much faster then RAID (which doesn''t use the reduncancy for checking if the data it''s returning is actually correct). Both hardware and software RAID implementations depend on the drives to return IO errors. In effect, the data is safer on btrfs than regular RAID. Besides that online resize (both shrinking and extending) and (currently not implemented) ability to set redundancy level on a per file basis. In other words, with btrfs you can have a file with RAID6 redundancy and a second one with RAID10 level of redundancy in single directory. Regards, -- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Daniel Pocock posted on Wed, 09 May 2012 22:01:49 +0000 as excerpted:> There is various information about > - enterprise-class drives (either SAS or just enterprise SATA) > - the SCSI/SAS protocols themselves vs SATA having more advanced > features (e.g. for dealing with error conditions) > than the average block deviceThis isn''t a direct answer to that, but expressing a bit of concern over the implications of your question, that you''re planning on using btrfs in an enterprise class installation. While various Enterprise Linux distributions do now officially "support" btrfs, it''s worth checking out exactly what that means in practice. Meanwhile, in mainline Linux kernel terms, btrfs remains very much an experimental filesystem, as expressed by the kernel config option that turns btrfs on. It''s still under very intensive development, with an error-fixing btrfsck only recently available and still coming with its own "may make the problems worse instead of fixing them" warning. Testers willing to risk the chance of data loss implied by that "experimental filesystem" label should be running the latest stable kernel at the oldest, and preferably the rcs by rc5 or so, as new kernels continue to fix problems in older btrfs code as well as introduce new features and if you''re running an older kernel, that means you''re running a kernel with known problems that are fixed in the latest kernel. Experimental also has implications in terms of backups. A good sysadmin always has backups, but normally, the working copy can be considered the primary copy, and there''s backups of that. On an experimental filesystem under as intense continued development as btrfs, by contrast, it''s best to consider your btrfs copy an extra "throwaway" copy only intended for testing. You still have your primary copy, along with all the usual backups, on something less experimental, since you never know when/where/ how your btrfs testing will screw up its copy. That''s not normally the kind of filesystem "enterprise class" users are looking for, unless of course they''re doing longer term testing, with an intent to actually deploy perhaps a year out, if the testing proves it robust enough by then. And while it''s still experimental ATM, btrfs /is/ fast improving. It /does/ now have a working fsck, even if it still comes with warnings, and reasonable feature-set build-out should be within a few more kernels (raid5/6 mode is roadmapped for 3.5, and n-way-mirroring raid1/10 are roadmapped after that, current "raid1" mode is only 2-way mirroring, regardless of the number of drives). After that, the focus should turn toward full stabilization. So while btrfs is currently intended for testers only, by around the end of the year or early next, it will likely be reasonably stable and ready for at least the more adventurous conventional users. Still, enterprise class users tend to be a conservative bunch, and I''d be surprised if they really consider btrfs ready before mid-year next year, at the earliest. So if you''re looking to test btrfs on enterprise-class hardware, great! But do be aware of what you''re getting into. If you have an enterprise distro which supports it too, even greater, but know what that actually means. Does it mean they support the same level of 9s uptime on it as they normally do, or just that they''re ready to accept payment to try and recover things if something goes wrong? If that hasn''t scared you off, and you''ve not read the wiki yet, that''s probably the next thing you should look at, as it answers a lot of questions you may have, as well as some you wouldn''t think to ask. Being a wiki, of course, your own contributions are welcome. In particular, you may well be able to cover some of the enterprise class viewpoint questions your asking based on your own testing, once you get to that point. https://btrfs.wiki.kernel.org/ -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Steigerwald
2012-May-11 16:58 UTC
Re: btrfs RAID with enterprise SATA or SAS drives
Am Freitag, 11. Mai 2012 schrieb Duncan:> Daniel Pocock posted on Wed, 09 May 2012 22:01:49 +0000 as excerpted: > > There is various information about > > - enterprise-class drives (either SAS or just enterprise SATA) > > - the SCSI/SAS protocols themselves vs SATA having more advanced > > features (e.g. for dealing with error conditions) > > than the average block device > > This isn''t a direct answer to that, but expressing a bit of concern > over the implications of your question, that you''re planning on using > btrfs in an enterprise class installation. > > While various Enterprise Linux distributions do now officially > "support" btrfs, it''s worth checking out exactly what that means in > practice. > > Meanwhile, in mainline Linux kernel terms, btrfs remains very much an > experimental filesystem, as expressed by the kernel config option that > turns btrfs on. It''s still under very intensive development, with an > error-fixing btrfsck only recently available and still coming with its > own "may make the problems worse instead of fixing them" warning. > Testers willing to risk the chance of data loss implied by that > "experimental filesystem" label should be running the latest stable > kernel at the oldest, and preferably the rcs by rc5 or so, as new > kernels continue to fix problems in older btrfs code as well as > introduce new features and if you''re running an older kernel, that > means you''re running a kernel with known problems that are fixed in > the latest kernel. > > Experimental also has implications in terms of backups. A good > sysadmin always has backups, but normally, the working copy can be > considered the primary copy, and there''s backups of that. On an > experimental filesystem under as intense continued development as > btrfs, by contrast, it''s best to consider your btrfs copy an extra > "throwaway" copy only intended for testing. You still have your > primary copy, along with all the usual backups, on something less > experimental, since you never know when/where/ how your btrfs testing > will screw up its copy.Duncan, did you actually test BTRFS? Theory can´t replace real life experience. From all of my personal BTRFS installations not one has gone corrupt - and I have at least four, while more of them are in use at my employer. Except maybe a scratch data BRTFS RAID 0 over lots of SATA disks. But maybe it would have been fixable by btrfs-zero-log which I didn´t know of back then. Another one needed a btrfs-zero-log, but that was quite some time ago. Some of the installations are in use for more than a year AFAIR. While I would still be reluctant with deploying BTRFS for a customer for critical data and I think Oracle´s and SUSE´s move to support it officially is a bit daring, I don´t think BTRFS is in a "throwaway copy" state anymore. As usual regular backups are important… -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Steigerwald posted on Fri, 11 May 2012 18:58:05 +0200 as excerpted: Martin Steigerwald posted on Fri, 11 May 2012 18:58:05 +0200 as excerpted:> Am Freitag, 11. Mai 2012 schrieb Duncan: >> Daniel Pocock posted on Wed, 09 May 2012 22:01:49 +0000 as excerpted: >> > There is various information about - enterprise-class drives>> This isn''t a direct answer to that, but expressing a bit of concern >> over the implications of your question, that you''re planning on using >> btrfs in an enterprise class installation.>> [In] mainline Linux kernel terms, btrfs remains very much an >> experimental filesystem>> On an experimental filesystem under as intense continued development as >> btrfs, by contrast, it''s best to consider your btrfs copy an extra >> "throwaway" copy only intended for testing. You still have your >> primary copy, along with all the usual backups, on something less >> experimental, since you never know when/where/ how your btrfs testing >> will screw up its copy. > > Duncan, did you actually test BTRFS? Theory can´t replace real life > experience.I /had/ been waiting until the n-way-mirrored-raid1 roadmapped for after raid5/6 mode (which should hit 3.5, I believe), but hardware issues intervened and I''m no longer using those older 4-way md/raid drives as primary. And now that I have it, present personal experience does not contradict what I posted. btrfs does indeed work reasonably well under reasonably good, non-stressful, conditions. But my experience so far aligns quite well with the "consider the btrfs copy a throw-away copy, just in case" recommendation. Just because it''s a throw-away copy doesn''t mean you''ll have to have to resort to the "good" copy elsewhere, but it DOES hopefully mean that you''ll have both a "good" copy elsewhere, and a backup for that supposedly good copy, just in case btrfs does go bad, and that supposedly good primary copy, ends up not being good after all.> From all of my personal BTRFS installations not one has gone corrupt - > and I have at least four, while more of them are in use at my employer. > Except maybe a scratch data BRTFS RAID 0 over lots of SATA disks. But > maybe it would have been fixable by btrfs-zero-log which I didn´t know > of back then. Another one needed a btrfs-zero-log, but that was quite > some time ago. > > Some of the installations are in use for more than a year AFAIR. > > While I would still be reluctant with deploying BTRFS for a customer for > critical dataThis was actually my point in this thread. If someone''s asking questions about enterprise quality hardware, they''re not likely to run into some of the bugs I''ve been having recently that have been exposed by hardware issues. However, they''re also far more likely to be considering btrfs for a row-of-nines uptime application, which is, after all, where some of btrfs'' features are normally found. Regardless of whether btrfs is past the "throw away data experimental class" stage or not, I think we both agree it isn''t ready for row-of-nines-uptime applications just yet. If he''s just testing btrfs on such equipment for possible future row-of-nines-uptime deployment a year or possibly two out, great. If he''s looking at such a deployment two-months-out, no way, and it looks like you agree. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> - if a non-RAID SAS card is used, does it matter which card is chosen? >> Does btrfs work equally well with all of them? > > If you''re using btrfs RAID, you need a HBA, not a RAID card. If the RAID > card can work as a HBA (usually labelled as JBOD mode) then you''re good to > go. > > For example, HP CCISS controllers can''t work in JBOD mode.Would you know if they implement their own checksumming, similar to what btrfs does? Or if someone uses SmartArray (CCISS) RAID1, then they simply don''t get the full benefit of checksumming under any possible configuration? I''ve had a quick look at what is on the market, here are some observations: - in many cases, IOPS (critical for SSDs) vary wildly: e.g. - SATA-3 SSDs advertise up to 85k IOPS, so RAID1 needs 170k IOPS - HP''s standard HBAs don''t support high IOPS - HP Gen8 SmartArray (e.g. P420) claims up to 200k IOPS - previous HP arrays (e.g. P212) support only 60k IOPS - many vendors don''t advertise the IOPS prominently - I had to Google the HP site to find those figures quoted in some PDFs, they don''t quote them in the quickspecs or product summary tables - Adaptec now offers an SSD caching function in hardware, supposedly drop it in the machine and all disks respond faster - how would this interact with btrfs checksumming? E.g. I''m guessing it would be necessary to ensure that data from both spindles is not cached on the same SSD? - I started thinking about the possibility that data is degraded on the mechanical disk but btrfs gets a good checksum read from the SSD and remains blissfully unaware that the real disk is failing, then the other disk goes completely offline one day, for whatever reason the data is not in the SSD cache and the sector can''t be read reliably from the remaining physical disk - should such caching just be avoided or can it be managed from btrfs itself in a manner that is foolproof? How about the combination of btrfs/root/boot filesystems and grub? Can they all play nicely together? This seems to be one compelling factor with hardware RAID, the cards have a BIOS that can boot from any drive even if the other is offline. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html