I''ve got a 5-500Gb Sata Raid-Z stack running under build 64a. I have two problems that may or may not be interrelated. 1) zpool scrub stops. If I do a "zpool status" it merrily continues for awhile. I can''t see any pattern in this action with repeated scrubs. 2) Bad blocks on one disk. This is repeatable, so I''m sending the disk back for replacement. (1) doesn''t seem to correlate to the time I hit the bad blocks, so I don''t think this is related. However... When it does hit those blocks, I not only get media sense read errors, but the sata port is dropped and reconnected. I think the driver probably does a port reset, but I figured I''d note it for discussion. Is there a way to remap the bad blocks for zfs? There were only a small number (19) that it hit during the scrub. I''d like to hear some general comments about these issues I''m having with zfs. Thanks, Gary This message posted from opensolaris.org
Gary Gendel wrote:> I''ve got a 5-500Gb Sata Raid-Z stack running under build 64a. I have two problems that may or may not be interrelated. > > 1) zpool scrub stops. If I do a "zpool status" it merrily continues for awhile. I can''t see any pattern in this action with repeated scrubs. > > 2) Bad blocks on one disk. This is repeatable, so I''m sending the disk back for replacement. (1) doesn''t seem to correlate to the time I hit the bad blocks, so I don''t think this is related. However... When it does hit those blocks, I not only get media sense read errors, but the sata port is dropped and reconnected. I think the driver probably does a port reset, but I figured I''d note it for discussion. Is there a way to remap the bad blocks for zfs? There were only a small number (19) that it hit during the scrub. > > I''d like to hear some general comments about these issues I''m having with zfs.Are you using the marvell88sx driver to attach your sata disks? If you are, then perhaps this putback from yesterday is what you are in need of: 6564677 oracle datafiles corrupted on thumper http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6564677 James C. McPherson -- Solaris kernel software engineer Sun Microsystems
James C. McPherson wrote:> Gary Gendel wrote: > >> I''ve got a 5-500Gb Sata Raid-Z stack running under build 64a. I have two problems that may or may not be interrelated. >> >> 1) zpool scrub stops. If I do a "zpool status" it merrily continues for awhile. I can''t see any pattern in this action with repeated scrubs. >> >> 2) Bad blocks on one disk. This is repeatable, so I''m sending the disk back for replacement. (1) doesn''t seem to correlate to the time I hit the bad blocks, so I don''t think this is related. However... When it does hit those blocks, I not only get media sense read errors, but the sata port is dropped and reconnected. I think the driver probably does a port reset, but I figured I''d note it for discussion. Is there a way to remap the bad blocks for zfs? There were only a small number (19) that it hit during the scrub. >> >> I''d like to hear some general comments about these issues I''m having with zfs. >> > > > Are you using the marvell88sx driver to attach > your sata disks? > > If you are, then perhaps this putback from yesterday > is what you are in need of: > > > 6564677 oracle datafiles corrupted on thumper > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6564677 >This bug and related fix has nothing to do with the zfs scrub issue. Regardless, I would like to know if this is happening with the marvell88sx driver (and if so, what hardware) or with some other driver and hardware. Regards, Lida Horn> > James C. McPherson > -- > Solaris kernel software engineer > Sun Microsystems > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Thanks for the information. I am using the Marvell8sx driver on a vanilla Sunfire v20z server. This project has gone through many frustrating phases... Originally I tried a Si3124 board with the box running a 5-1 Sil Sata multiplexer. The controller didn''t understand the multiplexer so I put in a second board and drove the drives directly. However, this didn''t work well either and would lock up periodically. I added some published driver patches which made things better, but I would still get periodic kernel panics because of a recursive mutex call. So, I bought the Supermicro 8 channel Sata Marvell card. I tried the multiplexer again, but no luck, so I''m driving each separately again. Occasionally, I would have a system lockup and could only force the system to power down. I believe that this may have been due to a flaky Sata connection internal to the box. Now I''m left with the situation that I described. Gary This message posted from opensolaris.org
>6564677 oracle datafiles corrupted on thumperwow, must be a huuuuuuge database server! :D This message posted from opensolaris.org
Gary Gendel wrote:> Thanks for the information. I am using the Marvell8sx driver on a vanilla Sunfire v20z server. This project has gone through many frustrating phases... > > Originally I tried a Si3124 board with the box running a 5-1 Sil Sata multiplexer. The controller didn''t understand the multiplexer so I put in a second board and drove the drives directly. > > However, this didn''t work well either and would lock up periodically. I added some published driver patches which made things better, but I would still get periodic kernel panics because of a recursive mutex call. > > So, I bought the Supermicro 8 channel Sata Marvell card. I tried the multiplexer again, but no luck, so I''m driving each separately again. Occasionally, I would have a system lockup and could only force the system to power down. I believe that this may have been due to a flaky Sata connection internal to the box. Now I''m left with the situation that I described.I''ve had the same issue on my box. It often (somewhere between every 30 minute to 4th hour) resets the sata ports. It continues fine, but it does halt the machine for some time. Recently after I gave up on debugging and took the machine into "production" it has started to freeze randomly about once per day.
We see similar problems on a SuperMicro with 5 500 GB Seagate sata drives. This is using the AHCI driver. We do not, however, see problems with the same hardware/drivers if we use 250GB drives. We sometimes see bad blocks reported (are these automatically remapped somehow so they are not used again?) and sometimes sata port resets. Here is a sample of the log output. Any help understanding and/or resolving this issue greatly appreciated. I very much don''t wont to have freezes in production. Aug 14 11:20:28 chazz1 port 2: device reset Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci15d9,8080 at 1f,2/disk at 2,0 (sd3): Aug 14 11:20:28 chazz1 Error for Command: write Error Level: Retryable Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] Requested Block: 530 Error Block: 530 Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] Sense Key: No_Additional_Sense Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0 This message posted from opensolaris.org
Rick Wager wrote:> We see similar problems on a SuperMicro with 5 500 GB Seagate sata drives. This is using the AHCI driver. We do not, however, see problems with the same hardware/drivers if we use 250GB drives.Duh. The error is from the disk :-)> We sometimes see bad blocks reported (are these automatically remapped somehow so they are not used again?) and sometimes sata port resets.Depending on how the errors are reported, the driver may attempt a reset to clear. The drive may also automaticaly spare bad blocks.> Here is a sample of the log output. Any help understanding and/or resolving this issue greatly appreciated. I very much don''t wont to have freezes in production. > > Aug 14 11:20:28 chazz1 port 2: device reset > Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci15d9,8080 at 1f,2/disk at 2,0 (sd3): > Aug 14 11:20:28 chazz1 Error for Command: write Error Level: Retryable > Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] Requested Block: 530 Error Block: 530 > Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: > Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] Sense Key: No_Additional_Sense > Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0This error was transient and retried. If it was a fatal error (still failed after retries) then you''ll have another, different message describing the failed condition. -- richard
Rick Wager wrote:> Thanks Richard! > > That''s the way I read the errors, also, they seem to indicate bad blocks > on the drives. The bad news is that when they occur access to the zfs > file system "stops" for quite a long time - seemingly from 30 seconds to > a minute or longer.They might be bad blocks, though usually we get more info than "no additional sense info." 30 seconds is a typical default retry timeout. The file system will seem to stop because it is ATA and can''t handle multiple I/O operations concurrently.> Do you have a recommendation for how to identify and map the bad blocks > so they are not used again? Should I fill my disk with data in order to > identify the bad blocks?The format command has a number of media scan and repair options.> Also, for what its worth as I''ve been running a simple test on my system > to copy a large number of files around a zpool in order to fill it up > and verify the zpool is working reliably. Just using a simple shell > script with very bad results: > - In one window the shell script has frozen for about an hour now, the > cp command is just hung. > - ls of the directories in the zfs file system just hangs > - In another window zpool status is also hung and never returns: > > chazz1<##> zpool > status > [11:54:49] > pool: fmpool > state: ONLINE > scrub: scrub completed with 0 errors on Tue Aug 14 12:01:40 2007 > ^C^C^C^C^C^C^C^ZTypical reaction for a malfunctioning disk.> - And this is the output from zpool iostat > > chazz1<*> zpool iostat 10 > 10 > [12:53:01] > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > fmpool 186G 2.08T 131 121 15.4M 13.3M > fmpool 186G 2.08T 0 0 0 0 > fmpool 186G 2.08T 0 0 0 0 > fmpool 186G 2.08T 0 0 0 0 > fmpool 186G 2.08T 0 0 0 0 > fmpool 186G 2.08T 0 0 0 0 > > I think we''ll have to reboot to clear this frozen condition. > > Any thoughts?According to the ahci man page, the driver does not yet support NCQ, which would also be consistent with the observed behaviour. Do the disks work ok in other machines? -- richard> Thanks, > Rick > > Richard Elling wrote: >> Rick Wager wrote: >>> We see similar problems on a SuperMicro with 5 500 GB Seagate sata >>> drives. This is using the AHCI driver. We do not, however, see >>> problems with the same hardware/drivers if we use 250GB drives. >> >> Duh. The error is from the disk :-) >> >>> We sometimes see bad blocks reported (are these automatically >>> remapped somehow so they are not used again?) and sometimes sata port >>> resets. >> >> Depending on how the errors are reported, the driver may attempt a reset >> to clear. The drive may also automaticaly spare bad blocks. >> >>> Here is a sample of the log output. Any help understanding and/or >>> resolving this issue greatly appreciated. I very much don''t wont to >>> have freezes in production. >>> Aug 14 11:20:28 chazz1 port 2: device reset >>> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.warning] WARNING: >>> /pci at 0,0/pci15d9,8080 at 1f,2/disk at 2,0 (sd3): >>> Aug 14 11:20:28 chazz1 Error for Command: write >>> Error Level: Retryable >>> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] Requested >>> Block: 530 Error Block: 530 >>> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] Vendor: >>> ATA Serial Number: Aug 14 >>> 11:20:28 chazz1 scsi: [ID 107833 kern.notice] Sense Key: >>> No_Additional_Sense >>> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] ASC: 0x0 (no >>> additional sense info), ASCQ: 0x0, FRU: 0x0 >> >> This error was transient and retried. If it was a fatal error (still >> failed after retries) then you''ll have another, different message >> describing the failed condition. >> -- richard >> > > -- > > Rick Wager > email: garrickwager at comcast.net <mailto:garrickwager at comcast.net> > 303-818-0576 (mobile) >
On Tue, 14 Aug 2007, Richard Elling wrote:> Rick Wager wrote: >> We see similar problems on a SuperMicro with 5 500 GB Seagate sata drives. This is using the AHCI driver. We do not, however, see problems with the same hardware/drivers if we use 250GB drives. > > Duh. The error is from the disk :-)A likely possiblity is that the disk drives are simply not getting enough (cool) airflow and are over-heating during periods of high system activity that generates a lot of disk head movement; for example, during a zpool scrub. And the extra platters present in the larger disk drives would require even more cooling capacity - which would validate your observations. Best to actually *measure* the effectiveness of the disk cooling design/installation. Recommendation: investigate the Fluke mini infrared thermometers - for example - the Fluke 62 at: http://www.testequipmentdepot.com/fluke/thermometers/62.htm In some disk drive installations, its possible for the infrared probe to "see" the disk HDA (Head Disk Assembly) without disturbing the drive. PS: I use a much older Fluke 80T-IR in combination with a digital multimeter with millivolt resolution (a Fluke meter of course!).>> We sometimes see bad blocks reported (are these automatically remapped somehow so they are not used again?) and sometimes sata port resets. > > Depending on how the errors are reported, the driver may attempt a reset > to clear. The drive may also automaticaly spare bad blocks. > >> Here is a sample of the log output. Any help understanding and/or resolving this issue greatly appreciated. I very much don''t wont to have freezes in production. >> >> Aug 14 11:20:28 chazz1 port 2: device reset >> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci15d9,8080 at 1f,2/disk at 2,0 (sd3): >> Aug 14 11:20:28 chazz1 Error for Command: write Error Level: Retryable >> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] Requested Block: 530 Error Block: 530 >> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: >> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] Sense Key: No_Additional_Sense >> Aug 14 11:20:28 chazz1 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0 > > This error was transient and retried. If it was a fatal error (still > failed after retries) then you''ll have another, different message > describing the failed condition. > -- richard >Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
Al, That makes so much sense that I can''t believe I missed it. One bay was the one giving me the problems. Switching drives didn''t affect that. Switching cabling didn''t affect that. Changing Sata controllers didn''t affect that. However, reorienting the case on it''s side did! I''ll be putting in a larger fan into the disk-stack case. Gary> On Tue, 14 Aug 2007, Richard Elling wrote: > > > Rick Wager wrote: > >> We see similar problems on a SuperMicro with 5 500 > GB Seagate sata drives. This is using the AHCI > driver. We do not, however, see problems with the > same hardware/drivers if we use 250GB drives. > > > > Duh. The error is from the disk :-) > > A likely possiblity is that the disk drives are > simply not getting > enough (cool) airflow and are over-heating during > periods of high > system activity that generates a lot of disk head > movement; for > example, during a zpool scrub. And the extra > platters present in the > larger disk drives would require even more cooling > capacity - which > would validate your observations. > > Best to actually *measure* the effectiveness of the > disk cooling > design/installation. Recommendation: investigate the > Fluke mini > infrared thermometers - for example - the Fluke 62 > at: > http://www.testequipmentdepot.com/fluke/thermometers/6 > 2.htm > > In some disk drive installations, its possible for > the infrared probe > to "see" the disk HDA (Head Disk Assembly) without > disturbing the > drive. > > PS: I use a much older Fluke 80T-IR in combination > with a digital > multimeter with millivolt resolution (a Fluke meter > of course!). > > >> We sometimes see bad blocks reported (are these > automatically remapped somehow so they are not used > again?) and sometimes sata port resets. > > > > Depending on how the errors are reported, the > driver may attempt a reset > > to clear. The drive may also automaticaly spare > bad blocks. > > > >> Here is a sample of the log output. Any help > understanding and/or resolving this issue greatly > appreciated. I very much don''t wont to have freezes > in production. > >> > >> Aug 14 11:20:28 chazz1 port 2: device reset > >> Aug 14 11:20:28 chazz1 scsi: [ID 107833 > kern.warning] WARNING: > /pci at 0,0/pci15d9,8080 at 1f,2/disk at 2,0 (sd3): > >> Aug 14 11:20:28 chazz1 Error for Command: write > Error Level: Retryable > chazz1 scsi: [ID 107833 kern.notice] Requested > Block: 530 Error Block: 530 > > Aug 14 11:20:28 chazz1 scsi: [ID 107833 > kern.notice] Vendor: ATA > Serial Number: > [ID 107833 kern.notice] Sense Key: > No_Additional_Sense > > Aug 14 11:20:28 chazz1 scsi: [ID 107833 > kern.notice] ASC: 0x0 (no additional sense info), > ASCQ: 0x0, FRU: 0x0 > > > > This error was transient and retried. If it was a > fatal error (still > > failed after retries) then you''ll have another, > different message > > describing the failed condition. > > -- richard > > > > Regards, > > Al Hopper Logical Approach Inc, Plano, TX. > al at logical-approach.com > Voice: 972.379.2133 Fax: 972.379.2134 > Timezone: US CDT > enSolaris Governing Board (OGB) Member - Apr 2005 to > Mar 2007 > http://www.opensolaris.org/os/community/ogb/ogb_2005-2 > 007/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ssThis message posted from opensolaris.org
I can confirm that the marvell88sx driver (or kernel 64a) regularly hangs the SATA card (SuperMicro 8-port) with the message about a port being reset. The hang is temporary but troublesome. It can be relieved by turning off NCQ in /etc/system with "set sata:sata_func_enable = 0x5" This message posted from opensolaris.org
> I can confirm that the marvell88sx driver (or kernel > 64a) regularly hangs the SATA card (SuperMicro > 8-port) with the message about a port being reset. > The hang is temporary but troublesome. > It can be relieved by turning off NCQ in /etc/system > with "set sata:sata_func_enable = 0x5"Thanks for the info. I''ll have to give this a try. BTW, I''ve verified that this happens on build 70 as well. Gary This message posted from opensolaris.org
>I can confirm that the marvell88sx driver (or kernel 64a) regularly hangs the SATA card (SuperMicro 8-port) with the message about a port being reset. The hang is temporary but troublesome.>This could be bug 6553517 which was fixed in build 66. Casper
I now have a improved sata and marvell88sx driver modules that deal with various error conditions in a much more solid way. Changes include reducing the number of required device resets, properly reporting media errors (rather than "no additional sense"), clearing aborted packets more rapidly so that after an hardware error progress is again made much more quickly. Further the driver is much quieter (far fewer messages in /var/adm/messages). If there is still interest, I can make those binaries available for testing prior to their availability in Solaris Nevada (OpenSolaris). These changes will be checked in soon, but the process always inserts a significant delay, so if anyone would like, please e-mail me and I will make those binaries available via e-mail. Regards, Lida This message posted from opensolaris.org
The latest changes to the sata and marvell88sx modules have been put back to Solaris Nevada and should be available in the next build (build 84). Hopefully, those of you who use it will find the changes helpful. This message posted from opensolaris.org
On Feb 12, 2008 4:45 AM, Lida Horn <Lida.Horn at sun.com> wrote:> The latest changes to the sata and marvell88sx modules > have been put back to Solaris Nevada and should be > available in the next build (build 84). Hopefully, > those of you who use it will find the changes helpful.I have indeed found it beneficial. I installed the new drivers on two machines, both of which were intermittently giving errors about device resets. One card did this so often that I believed the card was faulty and I would have to replace either the card or the motherboard. Since installing the new drivers I''ve had no issues whatsoever with drives on either box. I ran zpool scrubs continuously on the flaky box, replaced a disk with another one, and copied data about in an attempt to replicate the bus errors I had previously seen, to no avail. The other box has been similarly stable, as far as I can tell; I see no messages in the logs and the users haven''t complained when I asked them. Thank you for the work you''ve put into improving the state of these drivers; I meant to email you earlier this week and mention the great strides they have made, but other things took precedence. That, to my mind, is the primary evolution these drivers have made: I don''t have to worry about my HBAs any more. Thanks! Will
Will Murnane wrote:> On Feb 12, 2008 4:45 AM, Lida Horn <Lida.Horn at sun.com> wrote: > >> The latest changes to the sata and marvell88sx modules >> have been put back to Solaris Nevada and should be >> available in the next build (build 84). Hopefully, >> those of you who use it will find the changes helpful. >> > I have indeed found it beneficial. I installed the new drivers on two > machines, both of which were intermittently giving errors about device > resets. One card did this so often that I believed the card was > faulty and I would have to replace either the card or the motherboard. >I''m glad you find the new modules useful and am pleased with your results. One thing of which I would like you to be aware is that some of what was done was to suppress the messages. In other words, some of what was happening before is still happening, just silently.> Since installing the new drivers I''ve had no issues whatsoever with > drives on either box. I ran zpool scrubs continuously on the flaky > box, replaced a disk with another one, and copied data about in an > attempt to replicate the bus errors I had previously seen, to no > avail. The other box has been similarly stable, as far as I can tell; > I see no messages in the logs and the users haven''t complained when I > asked them. >"No issues whatsoever", wonderful words to hear!> Thank you for the work you''ve put into improving the state of these > drivers; I meant to email you earlier this week and mention the great > strides they have made, but other things took precedence. That, to my > mind, is the primary evolution these drivers have made: I don''t have > to worry about my HBAs any more. >I appreciate your taking the time to post and hope you have no further issues with the driver. Thank you, Lida> Thanks! > Will >