what`s your disk controller? -- This message posted from opensolaris.org
Supermicro AOC-SAT2-MV8, based on the Marvell chipset. I figured it was the best available at the time since it''s using the same chipset as the x4500 Thumper servers. Our next machine will be using LSI controllers, but I''m still not entirely happy with the way ZFS handles timeout type errors. It seems that it handles drive reported read or write errors fine, and also handles checksum errors, but it''s completely missed drive timeout errors as used by hardware raid controllers. Personally, I feel that when a pool usually responds to requests in the order of milliseconds, a timeout of even a tenth of a second is too long. Several minutes before a pool responds is just a joke. I''m still a big fan of ZFS, and modern hardware may have better error handling, but I can''t help but feel this is a little short sighted. -- This message posted from opensolaris.org
Ross wrote:> Supermicro AOC-SAT2-MV8, based on the Marvell chipset. I figured it was the best available at the time since it''s using the same chipset as the x4500 Thumper servers. > > Our next machine will be using LSI controllers, but I''m still not entirely happy with the way ZFS handles timeout type errors. It seems that it handles drive reported read or write errors fine, and also handles checksum errors, but it''s completely missed drive timeout errors as used by hardware raid controllers. > > Personally, I feel that when a pool usually responds to requests in the order of milliseconds, a timeout of even a tenth of a second is too long. Several minutes before a pool responds is just a joke. > > I''m still a big fan of ZFS, and modern hardware may have better error handling, but I can''t help but feel this is a little short sighted. >patches welcomed ./C
On Jul 30, 2009, at 2:04 PM, Ross wrote:> Supermicro AOC-SAT2-MV8, based on the Marvell chipset. I figured it > was the best available at the time since it''s using the same chipset > as the x4500 Thumper servers. > > Our next machine will be using LSI controllers, but I''m still not > entirely happy with the way ZFS handles timeout type errors. It > seems that it handles drive reported read or write errors fine, and > also handles checksum errors, but it''s completely missed drive > timeout errors as used by hardware raid controllers. > > Personally, I feel that when a pool usually responds to requests in > the order of milliseconds, a timeout of even a tenth of a second is > too long. Several minutes before a pool responds is just a joke.ZFS doesn''t have timeouts, at least not in the context you are referring. A tenth of a second is way too short, by at least 2 orders of magnitude.> > I''m still a big fan of ZFS, and modern hardware may have better > error handling, but I can''t help but feel this is a little short > sighted.Did you miss the memo? Likely, because it was buried in b97 head''s up list. PSARC 2008/465 Improved [s]sd-config-list support. http://arc.opensolaris.org/caselog/PSARC/2008/465/mail Note iSCSI tuning is different and "fixed" in b121, via PSARC 2009/369 iSCSI initiator tunables. http://arc.opensolaris.org/caselog/PSARC/2009/369/mail -- richard
I''m sure this has been discussed in the past. But its very hard to understand, or even patch incredibly advanced software such as ZFS without a deep understanding of the internals. It will take quite a while before anyone can start understanding a file system which was developed behind closed doors for nearly a decade, and then released into opensource land via tarballs "thrown over the wall". Only until recently the source has become more available to normal humans via projects such as indiana. Saying "if you don''t like it, patch it" is an ignorant cop-out, and a troll response to people''s problems with software. On 7/30/09, "C. Bergstr?m" <codestr0m at osunix.org> wrote:> Ross wrote: >> Supermicro AOC-SAT2-MV8, based on the Marvell chipset. I figured it was >> the best available at the time since it''s using the same chipset as the >> x4500 Thumper servers. >> >> Our next machine will be using LSI controllers, but I''m still not entirely >> happy with the way ZFS handles timeout type errors. It seems that it >> handles drive reported read or write errors fine, and also handles >> checksum errors, but it''s completely missed drive timeout errors as used >> by hardware raid controllers. >> >> Personally, I feel that when a pool usually responds to requests in the >> order of milliseconds, a timeout of even a tenth of a second is too long. >> Several minutes before a pool responds is just a joke. >> >> I''m still a big fan of ZFS, and modern hardware may have better error >> handling, but I can''t help but feel this is a little short sighted. >> > patches welcomed > > ./C > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- Sent from my mobile device
* Rob Terhaar (robbyt at robbyt.net) wrote:> I''m sure this has been discussed in the past. But its very hard to > understand, or even patch incredibly advanced software such as ZFS > without a deep understanding of the internals.It''s also very hard for the primary ZFS developers to satisfy everyone''s itch :-)> It will take quite a while before anyone can start understanding a > file system which was developed behind closed doors for nearly a > decade, and then released into opensource land via tarballs "thrown > over the wall". Only until recently the source has become more > available to normal humans via projects such as indiana.I don''t think you''ve got your facts straight. OpenSolaris was launched in June 2005. ZFS was integrated October 31st, 2005 after being in development (of a sort) from October 31st 2001[1]. It hasn''t been developed behind closed doors for nearly a decade. Four years at most and it was available for all to see (in much better form than ''tarballs thrown over the wall'') LONG before Indiana was even a gleam in Ian''s eyes.> Saying "if you don''t like it, patch it" is an ignorant cop-out, and a > troll response to people''s problems with software.And people seemingly expecting that the ZFS team (or any technology team working on OpenSolaris) has infinite cycles to solve everyone''s itches is equally ignorant imo. OpenSolaris (the project) is meant to be a community project. As in allowing contributions from entities outside of sun.com. So, saying ''patches welcomed'' is mostly an appropriate response (depending on how it''s presented) because they are in fact welcome. That''s sort of how opensource works (at least in my experience). If the primary developers aren''t ''scratching your itch'' then you (or someone you can get to do the work for you) can fix your own problems and contribute them back to the community as a whole where everyone wins. Cheers, -- Glenn 1 - http://blogs.sun.com/bonwick/entry/zfs_the_last_word_in
"C. Bergström"
2009-Jul-30 22:37 UTC
[zfs-discuss] Best ways to contribute WAS: Fed up with ZFS causing data loss
Rob Terhaar wrote:> I''m sure this has been discussed in the past. But its very hard to > understand, or even patch incredibly advanced software such as ZFS > without a deep understanding of the internals. > > It will take quite a while before anyone can start understanding a > file system which was developed behind closed doors for nearly a > decade, and then released into opensource land via tarballs "thrown > over the wall". Only until recently the source has become more > available to normal humans via projects such as indiana. > > Saying "if you don''t like it, patch it" is an ignorant cop-out, and a > troll response to people''s problems with software. >bs. I''m entirely *outside* of Sun and just tired of hearing whining and complaints about features not implemented. So the facts are a bit more clear in case you think I''m ignorant... #1 The source has been available and modified from those outside sun for I think 3 years?? #2 I fully agree the threshold to contribute is *significantly* high. (I''m working on a project to reduce this) #3 zfs unlike other things like the build system are extremely well documented. There are books on it, code to read and even instructors (Max Bruning) who can teach you about the internals. My project even organized a free online training for this.... This isn''t zfs-haters or zfs-..... Use it, love it or help out... documentation, patches to help lower the barrier of entry, irc support, donations, detailed and accurate feedback on needed features and lots of other things welcomed.. maybe there''s a more productive way to get what you need implemented? I think what I''m really getting at is instead of dumping on this list all the problems that need to be fixed and the long drawn out stories.. File a bug report.. put the time in to explore the issue on your own.. I''d bet that if even 5% of the developers using zfs sent a patch of some nature we would avoid this whole thread. Call me a troll if you like.. I''m still going to lose my tact every once in a while when all I see is whiny/noisy threads for days.. I actually don''t mean to single you out.. there just seems to be a lot of negativity lately.. ./C
Hi Richard, Yes, I did miss that one, but could you remind me what exactly are the sd and ssd drivers? I can find lots of details about configuring them, but no basic documentation telling me what they are. I''m also a little confused as to whether it would have helped our case. The logs above seemed to indicate that Solaris ignored huge numbers of timeouts before faulting the device. I''m guessing that''s down to FMA, and as far as I know, that''s not tunable is it? And yes, I spotted the iSCSI time outs, thank you. A couple of people have pointed that out to me now and I''m looking forward to testing it out. But coming back to the timeouts, (and I know I''m going over old ground - feel free to ignore me *grin*), you''re saying that a tenth of a second is way too short, and it needs to be at least 10s. Why is that? An Intel SSD device can return results in around 0.2ms. Waiting 10s is enough to delay 50,000 transactions, and around 2.5GB of data. If I''ve got a mirrored pair of those SSD''s, I really don''t want ZFS to even wait a tenth of a second before trying the second one. -- This message posted from opensolaris.org
Ross
2009-Jul-31 07:33 UTC
[zfs-discuss] Best ways to contribute WAS: Fed up with ZFS causing data loss
I''m going to reply because I think you''re being a little short sighted here. In response to your ''patches welcome'' comment. I''d love to submit a patch, but since I''ve had no programming training and my only real experience is with Visual Basic, I doubt I''m going to be much use. I''m more likely to fundamentally break something than help.> > Saying "if you don''t like it, patch it" is an ignorant cop-out, and a > > troll response to people''s problems with software.Yup, I''m with Robby here, and I''ll explain why below.> bs. I''m entirely *outside* of Sun and just tired of hearing whining and > complaints about features not implemented. So the facts are a bit more > clear in case you think I''m ignorant...Your attitude is showing through. Chill dude.> #1 The source has been available and modified from those outside sun for I think 3 years??Great. I''m not a programmer, explain again how this helps me?> #2 I fully agree the threshold to contribute is *significantly* high. (I''m working on a project to reduce this)Ok, that does sound helpful, thanks.> #3 zfs unlike other things like the build system are extremely well > documented. There are books on it, code to read and even instructors > (Max Bruning) who can teach you about the internals. My project even > rganized a free online training for this....Again, brilliant if you''re a programmer.> This isn''t zfs-haters or zfs-..... > > Use it, love it or help out...I''ll take all three thanks, and let me explain how I do that: I use this forum as a way to share my experiences with other ZFS users, and with the ZFS developers. In case you hadn''t noticed, this is a community, and there are more people here than just Sun developers. If I find something I don''t like, yes, I''m vocal. Sometimes I''m wrong and people educate me, other times people agree with me and if the consensus is that it''s serious enough I file a bug. What you need to understand is that software isn''t created by developers alone. A full team is made up of interface architects, designers, programmers, testers and support staff. No one person has all the skills needed. I don''t program, and any attempt I made there would likely be more of a hindrance than a help, so instead I test. I''ve put in huge amounts of time testing Solaris, far more than I would need to if we were just implementing it internally, and if I find a problem I first discuss it in these forums, and then write a bug report if it''s necessary. I''ve reported 10 bugs to Sun so far, 6 of which have now been fixed. Hell, there''s an old post from me that on its own resulted in lively discussion from a dozen or more people, cumulating in 3-4 significant bug reports, and a 15 page PDF writeup I created, summarizing several weeks of testing and discussion for the developers. I''ve also done my bit by coming on these forums, sharing my experience with others, and helping other people when they come across a problem that I already know how to solve. You coming here and accusing me of not helping is incredibly narrow minded. I may not be a programmer, but I''ve put in hundreds of hours of work on ZFS, helping to improve the quality of the product and doing my bit to get more people using it.> documentation, patches to help lower the barrier of entry, irc support, > donations, detailed and accurate feedback on needed features and lots of > other things welcomed.. maybe there''s a more productive way to get what > you need implemented? > > I think what I''m really getting at is instead of dumping on this list > all the problems that need to be fixed and the long drawn out stories.. > File a bug report.. put the time in to explore the issue on your own.. > I''d bet that if even 5% of the developers using zfs sent a patch of some > nature we would avoid this whole thread."dumping on the list". You mean sharing problems with the community so other people can be aware of the issues? As I explained above, this is a community, and in case you haven''t realised yet, it''s not made up entirely of programmers. A lot of us on here are network admins, and to us these posts are valuable. I come here regularly to read posts exactly like this from other people because that way I get to benefit from the experiences of other admins.> Call me a troll if you like.. I''m still going to lose my tact every once > in a while when all I see is whiny/noisy threads for days.. I actually > don''t mean to single you out.. there just seems to be a lot of > negativity lately..These "whiney" threads are very helpful to those of us actually using the software, but don''t worry, I also have a tendency to lose my tact when faced with "whiney" programmers. :-p -- This message posted from opensolaris.org
Hi, Most of the time while waiting on a disk to fail is spent in disk drivers and not ZFS itself. If you want to lower the timeouts than you can do so by configuring different timeouts for sd. ssd or any other driver you are using. See http://wikis.sun.com/display/StorageDev/Retry-Reset+Parameters So if you are using sd driver it will try sd_io_time * sd_retry_count times where by default sd_io_time is 60s and sd_retry_count is 5 or 3 depending on device type (fc or not) IIRC. Try to lower the timeout or number of retries or both. -- Robert Milkowski http://milek.blogspot.com -- This message posted from opensolaris.org
Interesting, thanks Miles. Up to this week I''ve never heard that any of this was tunable, but I''m more than happy to go in that direction if that''s the way to do it. :-) Can anybody point me in the direction of where I find documentation for tunables for the Marvell SATA driver, the LSI SAS driver (for 1064E), the Adaptec RAID driver, and the iSER driver? -- This message posted from opensolaris.org
On Thu, 30 Jul 2009 23:55:19 -0700 (PDT) Ross <no-reply at opensolaris.org> wrote:> Yes, I did miss that one, but could you remind me what exactly are the > sd and ssd drivers? I can find lots of details about configuring them, > but no basic documentation telling me what they are.you could have tried "man sd" and "man ssd" Devices sd(7D) NAME sd - SCSI disk and ATAPI/SCSI CD-ROM device driver SYNOPSIS sd at target,lun:partition Devices ssd(7D) NAME ssd - Fibre Channel Arbitrated Loop disk device driver SYNOPSIS ssd at port,target:partition You won''t see an ssd instance on x86, only on sparc. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog Kernel Conference Australia - http://au.sun.com/sunnews/events/2009/kernel
Where to find information? Do some searching... 1st off, run through the drivers loaded via modinfo - look to see if anything there is specific to your card. prtconf -v | pg - again, looking for your controller card... once you find it - look at the driver listed or tied to it, or failing that, the vendor-id string if it''s available. If that fails, look at the device in /dev/dsk - follow the symlink to it''s /devices entry, see what the major number is, refer to /etc/name_to_major - find the major device number, see the driver (or alias) associated with it. Can also refer to the /etc/path_to_inst to gleen some info on the disks (looking for partial /devices pathing as you find in the above note, then see which driver is used for the disk device - regardless of the driver used for the controller, the disks *should* normally use either the sd or ssd driver. Once you figure out which driver is used for your card, look in /kernel/drv for the driver name, and see if there''s a .conf file with the same name prefix. Look for tunables for the card driver (using the driver.conf name), as well as the sd or ssd driver (via sd.conf, ssd.conf) - spend a few minutes searching via your favorite web search engine. A little time spent digging can resolve a lot of problems... -- This message posted from opensolaris.org
> you could have tried "man sd" and "man ssd"D''oh. I''m far too used to downloading documentation online... when you come from a windows background having driver manuals on your system is rather unexpected :) Thanks James. -- This message posted from opensolaris.org
Heh, that''s one thing I love about Linux & Solaris - the amount of info you can find if you know what you''re doing is scary. However, while that will work for the Marvell SATA card I do have fitted in a server, it''s not going to help for the others - they are all items I''m researching for our next system, and I don''t have them to hand right now. But from what you''re saying, is the sd driver used for more than just SCSI hard disk and cd-rom devices? The man page for sd says nothing about it being used for other devices, although googling the SATA driver does reveal a link there. Is the SD driver used as the framework for all of these (SATA, SAS, iSER, Adaptec)? If so, these tunables really could be a godsend! Ross PS. I''m all for digging, but when you''re in over your head it''s time to shout for help ;-) -- This message posted from opensolaris.org
You might check the hardware compatibility list at Sun''s site.. It might list the driver that will be used for the card your looking at... I''m not sure, it''s been a while since I''ve looked at it... -- This message posted from opensolaris.org
sd is the older scsi-disk driver, ssd is the new scsi-disk driver (part of the leadville driver package) that allowed for more than 256 luns per target... We''ve had systems that used the sd drivers until we upgraded to newer, sun provided drivers for qlogic / emulex cards, which then were using the ssd driver. Both handle scsi (or emulated scsi from the different device driver layers) devices. -- This message posted from opensolaris.org
max at bruningsystems.com
2009-Jul-31 14:24 UTC
[zfs-discuss] Best ways to contribute WAS: Fed up with ZFS causing data loss
Hi Ross, Ross wrote:>> #3 zfs unlike other things like the build system are extremely well >> documented. There are books on it, code to read and even instructors >> (Max Bruning) who can teach you about the internals. My project even >> rganized a free online training for this.... >> > > Again, brilliant if you''re a programmer. > >I think it is a misconception that a course about internals is meant only for programmers. An internals course should teach how the system works. If you are a programmer, this should help you to do programming on the system. If you are an admin, it should help you in your admin work by giving you a better understanding of what the system is doing. If you are a user, it should help you to make better use of the system. In short, I think anyone who is working with Solaris/OpenSolaris can benefit. max
On Thu, 30 Jul 2009, Ross wrote:> > Yes, I did miss that one, but could you remind me what exactly are > the sd and ssd drivers? I can find lots of details about > configuring them, but no basic documentation telling me what they > are.Is your system lacking manual pages? I find excruciating details on my Solaris 10 system. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Richard Elling
2009-Jul-31 16:31 UTC
[zfs-discuss] Best ways to contribute WAS: Fed up with ZFS causing data loss
On Jul 31, 2009, at 7:24 AM, max at bruningsystems.com wrote:> Hi Ross, > > Ross wrote: >>> #3 zfs unlike other things like the build system are extremely >>> well documented. There are books on it, code to read and even >>> instructors (Max Bruning) who can teach you about the internals. >>> My project even rganized a free online training for this.... >>> >> >> Again, brilliant if you''re a programmer. >> >> > I think it is a misconception that a course about internals is meant > only for programmers. > An internals course should teach how the system works. If you are a > programmer, this > should help you to do programming on the system. If you are an > admin, it should help > you in your admin work by giving you a better understanding of what > the system is doing. > If you are a user, it should help you to make better use of the > system. In short, I think anyone > who is working with Solaris/OpenSolaris can benefit.I agree with Max, 110%. As an example, for the USENIX Technical Conference I put together a full day tutorial on ZFS. It was really 2.5 days of tutorial crammed into one day, but hey, you get more than you pay for sometimes :-). I kept the level above the source code, but touching on the structure of the system, the on-disk format, why nvlists are used, and discussed a few of the acronyms seen in various messages or references. To get into the source code level, you are looking at a week or more of lecture (and growing). I am planning a sys-admin oriented version of this tutorial for the USENIX LISA conference in November. I intend to move away from the technical (how this is done) and more towards the operational (practical, sane implementations). If anyone has suggestions for topics to be covered, please drop me a line. Also, if anyone wants to schedule some time at their site for training, I''m more than happy to travel :-) -- richard
This may have been mentioned elsewhere and, if so, I apologize for repeating. Is it possible your difficulty here is with the Marvell driver and not, strictly speaking, ZFS? The Solaris Marvell driver has had many, MANY bug fixes and continues to this day to be supported by IDR patches and other quick-fix work-arounds. It is the source of many problems. Graned, ZFS handles these poorly at times (it got a lot better with ZFS v10) but it is difficult to expect the file system to deal well with underlying instability in the hardware driver I think. I''d be interested to hear if your experiences are the same using the LSI controllers which have a much better driver in Solaris. Ross wrote:> Supermicro AOC-SAT2-MV8, based on the Marvell chipset. I figured it was the best available at the time since it''s using the same chipset as the x4500 Thumper servers. > > Our next machine will be using LSI controllers, but I''m still not entirely happy with the way ZFS handles timeout type errors. It seems that it handles drive reported read or write errors fine, and also handles checksum errors, but it''s completely missed drive timeout errors as used by hardware raid controllers. > > Personally, I feel that when a pool usually responds to requests in the order of milliseconds, a timeout of even a tenth of a second is too long. Several minutes before a pool responds is just a joke. > > I''m still a big fan of ZFS, and modern hardware may have better error handling, but I can''t help but feel this is a little short sighted. >