Hi everyone, I''ve been running a zfs fileserver for about a month now (on snv_91) and it''s all working really well. I''m scrubbing once a week and nothing has come up as a problem yet. I''m a little worried as I''ve just noticed these messages in /var/adm/message and I don''t know if they''re bad or just informational: Aug 2 14:46:06 exodus Error for Command: read_defect_data Error Level: Informational Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Sense Key: Illegal_Request Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] ASC: 0x20 (invalid command operation code), ASCQ: 0x0, FRU: 0x0 Aug 2 14:46:06 exodus scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci1043,8239 at 5/disk at 0,0 (sd0): Aug 2 14:46:06 exodus Error for Command: log_sense Error Level: Informational Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Sense Key: Illegal_Request Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] ASC: 0x24 (invalid field in cdb), ASCQ: 0x0, FRU: 0x0 Aug 2 14:46:06 exodus scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci1043,8239 at 5/disk at 0,0 (sd0): Aug 2 14:46:06 exodus Error for Command: mode_sense Error Level: Informational Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Sense Key: Illegal_Request Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] ASC: 0x24 (invalid field in cdb), ASCQ: 0x0, FRU: 0x0 Aug 2 14:46:06 exodus scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci1043,8239 at 5/disk at 0,0 (sd0): Aug 2 14:46:06 exodus Error for Command: mode_sense Error Level: Informational Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Sense Key: Illegal_Request Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] ASC: 0x24 (invalid field in cdb), ASCQ: 0x0, FRU: 0x0 Any insights would be greatly appreciated. Thanks Matt No virus found in this outgoing message. Checked by AVG - http://www.avg.com Version: 8.0.138 / Virus Database: 270.5.10/1586 - Release Date: 01/08/2008 18:59
What does zpool status say? This message posted from opensolaris.org
Ross wrote:> What does zpool status say?zpool status says everythings fine, i''ve run another scrub and it hasn''t found any errors, so can i just consider this harmless? its filling up my log quickly though thanks Matt No virus found in this outgoing message. Checked by AVG - http://www.avg.com Version: 8.0.138 / Virus Database: 270.5.10/1586 - Release Date: 01/08/2008 18:59
Matt Harrison wrote:> Ross wrote: >> What does zpool status say? > > zpool status says everythings fine, i''ve run another scrub and it hasn''t > found any errors, so can i just consider this harmless? its filling up > my log quickly though >I''ve just checked past logs and i''m getting up to about 250mb of these messages each week. if this is not a harmful error is there any way to mute this particular message? I''d rather not be accumulating such large logs without good reason. thanks Matt No virus found in this outgoing message. Checked by AVG - http://www.avg.com Version: 8.0.138 / Virus Database: 270.5.10/1586 - Release Date: 01/08/2008 18:59
Hi, First of all, I really should warn you that I''m very new to Solaris, I''ll happily share my thoughts but be aware that there''s not a lot of experience backing them up.>From what you''ve said, and the logs you''ve posted I suspect you''re hitting recoverable read errors. ZFS wouldn''t flag these as no corrupt data has been encountered, but I suspect the device driver is logging them anyway.The log you posted all appears to refer to one disk (sd0), my guess would be that you have some hardware faults on that device and if it were me I''d probably be replacing it before it actually fails. I''d check your logs before replacing that disk though, you need to see if it''s just that one disk, or if others are affected. Provided you have a redundant ZFS pool, it may be worth offlining that disk, unconfiguring it with cfgadm, and then pulling the drive to see if that does cure the warnings you''re getting in the logs. Whatever you do, please keep me posted. Your post has already made me realise it would be a good idea to have a script watching log file sizes to catch problems like this early. Ross This message posted from opensolaris.org
Ross wrote:> Hi, > > First of all, I really should warn you that I''m very new to Solaris, I''ll happily share my thoughts but be aware that there''s not a lot of experience backing them up. > >>From what you''ve said, and the logs you''ve posted I suspect you''re hitting recoverable read errors. ZFS wouldn''t flag these as no corrupt data has been encountered, but I suspect the device driver is logging them anyway. > > The log you posted all appears to refer to one disk (sd0), my guess would be that you have some hardware faults on that device and if it were me I''d probably be replacing it before it actually fails. > > I''d check your logs before replacing that disk though, you need to see if it''s just that one disk, or if others are affected. Provided you have a redundant ZFS pool, it may be worth offlining that disk, unconfiguring it with cfgadm, and then pulling the drive to see if that does cure the warnings you''re getting in the logs. > > Whatever you do, please keep me posted. Your post has already made me realise it would be a good idea to have a script watching log file sizes to catch problems like this early. > > RossThanks for your insights, I''m also relatively new to solaris but i''ve been on linux for years. I''ve just read more into the logs and its giving these errors for all 3 of my disks (sd0,1,2). I''m running a raidz1, unfortunately without any spares and I''m not too keen on removing the parity from my pool as I''ve got a lot of important files stored there. I would agree that this seems to be a recoverable error and nothing is getting corrupted thanks to ZFS. The thing I''m worried about is if the entire batch is failing slowly and will all die at the same time. Hopefully some ZFS/hardware guru can comment on this before the world ends for me :P Thanks Matt No virus found in this outgoing message. Checked by AVG - http://www.avg.com Version: 8.0.138 / Virus Database: 270.5.10/1587 - Release Date: 02/08/2008 17:30
>>>>> "mh" == Matt Harrison <iwasinnamuknow at genestate.com> writes:mh> I''m worried about is if the entire batch is failing slowly mh> and will all die at the same time. If you can download smartctl, you can use the approach described here: http://web.Ivy.NET/~carton/rant/ml/raid-findingBadDisks-0.html http://web.Ivy.NET/~carton/rant/ml/raid-findingBadDisks-1.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080803/90e3b1a5/attachment.bin>
Hi Matt, If it''s all 3 disks, I wouldn''t have thought it likely to be disk errors, and I don''t think it''s a ZFS fault as such. You might be better posting the question in the storage or help forums to see if anybody there can shed more light on this. Ross> Date: Sun, 3 Aug 2008 16:48:03 +0100> From: iwasinnamuknow at genestate.com> To: myxiplx at hotmail.com> CC: zfs-discuss at opensolaris.org> Subject: Re: [zfs-discuss] are these errors dangerous> > Ross wrote:> > Hi,> > > > First of all, I really should warn you that I''m very new to Solaris, I''ll happily share my thoughts but be aware that there''s not a lot of experience backing them up.> > > >>From what you''ve said, and the logs you''ve posted I suspect you''re hitting recoverable read errors. ZFS wouldn''t flag these as no corrupt data has been encountered, but I suspect the device driver is logging them anyway.> > > > The log you posted all appears to refer to one disk (sd0), my guess would be that you have some hardware faults on that device and if it were me I''d probably be replacing it before it actually fails.> > > > I''d check your logs before replacing that disk though, you need to see if it''s just that one disk, or if others are affected. Provided you have a redundant ZFS pool, it may be worth offlining that disk, unconfiguring it with cfgadm, and then pulling the drive to see if that does cure the warnings you''re getting in the logs.> > > > Whatever you do, please keep me posted. Your post has already made me realise it would be a good idea to have a script watching log file sizes to catch problems like this early.> > > > Ross> > Thanks for your insights, I''m also relatively new to solaris but i''ve > been on linux for years. I''ve just read more into the logs and its > giving these errors for all 3 of my disks (sd0,1,2). I''m running a > raidz1, unfortunately without any spares and I''m not too keen on > removing the parity from my pool as I''ve got a lot of important files > stored there.> > I would agree that this seems to be a recoverable error and nothing is > getting corrupted thanks to ZFS. The thing I''m worried about is if the > entire batch is failing slowly and will all die at the same time.> > Hopefully some ZFS/hardware guru can comment on this before the world > ends for me :P> > Thanks> > Matt> > No virus found in this outgoing message.> Checked by AVG - http://www.avg.com > Version: 8.0.138 / Virus Database: 270.5.10/1587 - Release Date: 02/08/2008 17:30> >_________________________________________________________________ Win a voice over part with Kung Fu Panda & Live Search?? and?? 100?s of Kung Fu Panda prizes to win with Live Search http://clk.atdmt.com/UKM/go/107571439/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080803/86d2b507/attachment.html>
Miles Nordin wrote:>>>>>> "mh" == Matt Harrison <iwasinnamuknow at genestate.com> writes: > > mh> I''m worried about is if the entire batch is failing slowly > mh> and will all die at the same time. > > If you can download smartctl, you can use the approach described here: > > http://web.Ivy.NET/~carton/rant/ml/raid-findingBadDisks-0.html > http://web.Ivy.NET/~carton/rant/ml/raid-findingBadDisks-1.htmlI already had smartmontools for temp monitoring. using smartctl -a I get : Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 33 C Error Counter logging not supported <---- unhelpful No self-tests have been logged So it looks like I can''t use the error count on these (sata) drives. Otherwise everything else looks ok for all 3. And regard Ross'' reply, I will try posting something to storage-discuss and see if anyone has more ideas. thanks Matt No virus found in this outgoing message. Checked by AVG - http://www.avg.com Version: 8.0.138 / Virus Database: 270.5.10/1587 - Release Date: 02/08/2008 17:30
On Sun, Aug 3, 2008 at 8:48 PM, Matt Harrison <iwasinnamuknow at genestate.com>wrote:> Miles Nordin wrote: > >>>>>> "mh" == Matt Harrison <iwasinnamuknow at genestate.com> writes: > > > > mh> I''m worried about is if the entire batch is failing slowly > > mh> and will all die at the same time. > > >Matt, can you please post the output from this command: iostat -E This will show counts of the types of errors for all disks since the last reboot. I am guessing sd0 is your CD / DVD drive. Thank you, _Johan -- Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke Afrikaanse Stap Website: http://www.bloukous.co.za My blog: http://initialprogramload.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080803/a0e9b445/attachment.html>
Matt Harrison wrote:> Hi everyone, > > I''ve been running a zfs fileserver for about a month now (on snv_91) and > it''s all working really well. I''m scrubbing once a week and nothing has > come up as a problem yet. > > I''m a little worried as I''ve just noticed these messages in > /var/adm/message and I don''t know if they''re bad or just informational: > > Aug 2 14:46:06 exodus Error for Command: read_defect_data Error > Level: Informational >key here: "Informational"> Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Requested Block: > 0 Error Block: 0 > Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Vendor: ATA > Serial Number: > Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Sense Key: > Illegal_Request > Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] ASC: 0x20 > (invalid command operation code), ASCQ: 0x0, FRU: 0x0 >Key here: "ASC 0x20 (invalid command operation code)"> Aug 2 14:46:06 exodus scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci1043,8239 at 5/disk at 0,0 (sd0): > Aug 2 14:46:06 exodus Error for Command: log_sense Error > Level: Informational > Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Requested Block: > 0 Error Block: 0 > Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Vendor: ATA > Serial Number: > Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Sense Key: > Illegal_Request > Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] ASC: 0x24 > (invalid field in cdb), ASCQ: 0x0, FRU: 0x0 >Key here: "invalid field in cbd" where CDB is command data block http://en.wikipedia.org/wiki/SCSI_CDB Obviously a command is being sent to the device that it doesn''t understand. This could be a host side driver or disk firmware problem. I''d classify this as annoying, but doesn''t appear dangerous on the face. With some digging you could determine which command is failing, but that won''t fix anything. You might check with the disk vendor for firmware upgrades and you might look at a later version of the OS drivers. This isn''t a ZFS issue, so you might have better luck on the storage-discuss forum. -- richard
Johan Hartzenberg wrote:> On Sun, Aug 3, 2008 at 8:48 PM, Matt Harrison > <iwasinnamuknow at genestate.com>wrote: > >> Miles Nordin wrote: >>>>>>>> "mh" == Matt Harrison <iwasinnamuknow at genestate.com> writes: >>> mh> I''m worried about is if the entire batch is failing slowly >>> mh> and will all die at the same time. >>> > > > Matt, can you please post the output from this command: > > iostat -Eroot at exodus:~ # iostat -E cmdk0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: WDC WD2000JB-00 Revision: Serial No: WD-WCAL81632817 Size: 200.05GB <200047067136 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 sd0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD7500AAKS-0 Revision: 4G30 Serial No: Size: 750.16GB <750156374016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 478675 Predictive Failure Analysis: 0 sd1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD7500AAKS-0 Revision: 4G30 Serial No: Size: 750.16GB <750156374016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 478626 Predictive Failure Analysis: 0 sd2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD7500AAKS-0 Revision: 4G30 Serial No: Size: 750.16GB <750156374016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 478604 Predictive Failure Analysis: 0 sd3 Soft Errors: 0 Hard Errors: 16 Transport Errors: 0 Vendor: HL-DT-ST Product: DVDRAM_GSA-H10N Revision: JX06 Serial No: Size: 0.00GB <0 bytes> Media Error: 0 Device Not Ready: 16 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 Lots of illegal requests, and a few hard errors. Doesn''t look good.> This will show counts of the types of errors for all disks since the last > reboot. I am guessing sd0 is your CD / DVD drive.I don''t think so, my dvd drive is on ide along with the boot drive, while my pool is on 3 SATA disks. Thanks Matt No virus found in this outgoing message. Checked by AVG - http://www.avg.com Version: 8.0.138 / Virus Database: 270.5.10/1587 - Release Date: 02/08/2008 17:30
Richard Elling wrote:> Matt Harrison wrote: >> Aug 2 14:46:06 exodus Error for Command: read_defect_data >> Error Level: Informational >> > > key here: "Informational" > >> Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Requested >> Block: 0 Error Block: 0 >> Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Vendor: ATA >> Serial Number: >> Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Sense Key: >> Illegal_Request >> Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] ASC: 0x20 >> (invalid command operation code), ASCQ: 0x0, FRU: 0x0 >> > > Key here: "ASC 0x20 (invalid command operation code)" > >> Aug 2 14:46:06 exodus scsi: [ID 107833 kern.warning] WARNING: >> /pci at 0,0/pci1043,8239 at 5/disk at 0,0 (sd0): >> Aug 2 14:46:06 exodus Error for Command: log_sense >> Error Level: Informational >> Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Requested >> Block: 0 Error Block: 0 >> Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Vendor: ATA >> Serial Number: >> Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] Sense Key: >> Illegal_Request >> Aug 2 14:46:06 exodus scsi: [ID 107833 kern.notice] ASC: 0x24 >> (invalid field in cdb), ASCQ: 0x0, FRU: 0x0 >> > > Key here: "invalid field in cbd" where CDB is command data block > http://en.wikipedia.org/wiki/SCSI_CDB > > Obviously a command is being sent to the device that it doesn''t > understand. This could be a host side driver or disk firmware problem. > I''d classify this as annoying, but doesn''t appear dangerous on the face. > With some digging you could determine which command is failing, > but that won''t fix anything. You might check with the disk vendor > for firmware upgrades and you might look at a later version of the > OS drivers.Well I''m pleased it doesn''t scream DANGER to people. I can live with clearing out the logs now and then. I will check with WD if there are firmware updates for these disks, and I will update my snv at some point.> This isn''t a ZFS issue, so you might have better luck on the > storage-discussI have posted to storage-discuss a little while ago. I''m not even sure why I posted here in the first place, storage-discuss would be a much better idea. Thanks Matt No virus found in this outgoing message. Checked by AVG - http://www.avg.com Version: 8.0.138 / Virus Database: 270.5.10/1587 - Release Date: 02/08/2008 17:30
I have seen this too I ''m guessing you have SATA disks which are on a iSCSI target. I''m also guessing you have used something like iscsitadm create target --type raw -b /dev/dsk/c4t0d00 c4t0d0 ie you are not using a zfs shareiscsi property on a zfs volume but creating the target from the device cNtNdN (dsk or rdsk it doesn''t seem to matter) You see these errors (always block 0) when the iSCSI initiator accesses the disks annoying ... but the iSCSI transactions seem to be OK. -- This message posted from opensolaris.org