thr3ads.net - zfs discuss - [zfs-discuss] are these errors dangerous [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Matt Harrison

2008-Aug-02 13:46 UTC

[zfs-discuss] are these errors dangerous

Hi everyone,

I''ve been running a zfs fileserver for about a month now (on snv_91)
and
it''s all working really well. I''m scrubbing once a week and
nothing has
come up as a problem yet.

I''m a little worried as I''ve just noticed these messages in 
/var/adm/message and I don''t know if they''re bad or just
informational:

Aug  2 14:46:06 exodus  Error for Command: read_defect_data        Error 
Level: Informational
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Requested Block: 
0                         Error Block: 0
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Vendor: ATA 
                            Serial Number:
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Sense Key: 
Illegal_Request
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    ASC: 0x20 
(invalid command operation code), ASCQ: 0x0, FRU: 0x0
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.warning] WARNING: 
/pci at 0,0/pci1043,8239 at 5/disk at 0,0 (sd0):
Aug  2 14:46:06 exodus  Error for Command: log_sense               Error 
Level: Informational
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Requested Block: 
0                         Error Block: 0
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Vendor: ATA 
                            Serial Number:
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Sense Key: 
Illegal_Request
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    ASC: 0x24 
(invalid field in cdb), ASCQ: 0x0, FRU: 0x0
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.warning] WARNING: 
/pci at 0,0/pci1043,8239 at 5/disk at 0,0 (sd0):
Aug  2 14:46:06 exodus  Error for Command: mode_sense              Error 
Level: Informational
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Requested Block: 
0                         Error Block: 0
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Vendor: ATA 
                            Serial Number:
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Sense Key: 
Illegal_Request
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    ASC: 0x24 
(invalid field in cdb), ASCQ: 0x0, FRU: 0x0
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.warning] WARNING: 
/pci at 0,0/pci1043,8239 at 5/disk at 0,0 (sd0):
Aug  2 14:46:06 exodus  Error for Command: mode_sense              Error 
Level: Informational
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Requested Block: 
0                         Error Block: 0
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Vendor: ATA 
                            Serial Number:
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Sense Key: 
Illegal_Request
Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    ASC: 0x24 
(invalid field in cdb), ASCQ: 0x0, FRU: 0x0

Any insights would be greatly appreciated.

Thanks

Matt

No virus found in this outgoing message.
Checked by AVG - http://www.avg.com 
Version: 8.0.138 / Virus Database: 270.5.10/1586 - Release Date: 01/08/2008
18:59

Ross

2008-Aug-02 18:45 UTC

head link

[zfs-discuss] are these errors dangerous

What does zpool status say?
 
 
This message posted from opensolaris.org

Matt Harrison

2008-Aug-02 19:49 UTC

head link

[zfs-discuss] are these errors dangerous

Ross wrote:> What does zpool status say?
zpool status says everythings fine, i''ve run another scrub and it
hasn''t
found any errors, so can i just consider this harmless? its filling up 
my log quickly though

thanks

Matt

No virus found in this outgoing message.
Checked by AVG - http://www.avg.com 
Version: 8.0.138 / Virus Database: 270.5.10/1586 - Release Date: 01/08/2008
18:59

Matt Harrison

2008-Aug-02 19:54 UTC

head link

[zfs-discuss] are these errors dangerous

Matt Harrison wrote:> Ross wrote:
>> What does zpool status say?
> 
> zpool status says everythings fine, i''ve run another scrub and it
hasn''t
> found any errors, so can i just consider this harmless? its filling up 
> my log quickly though
> 
I''ve just checked past logs and i''m getting up to about 250mb
of these
messages each week. if this is not a harmful error is there any way to 
mute this particular message? I''d rather not be accumulating such large
logs without good reason.

thanks

Matt


No virus found in this outgoing message.
Checked by AVG - http://www.avg.com 
Version: 8.0.138 / Virus Database: 270.5.10/1586 - Release Date: 01/08/2008
18:59

Ross

2008-Aug-03 10:36 UTC

head link

[zfs-discuss] are these errors dangerous

Hi,

First of all, I really should warn you that I''m very new to Solaris,
I''ll happily share my thoughts but be aware that there''s not a
lot of experience backing them up.
>From what you''ve said, and the logs you''ve posted I
suspect you''re hitting recoverable read errors.  ZFS wouldn''t
flag these as no corrupt data has been encountered, but I suspect the device
driver is logging them anyway.
The log you posted all appears to refer to one disk (sd0), my guess would be
that you have some hardware faults on that device and if it were me I''d
probably be replacing it before it actually fails.

I''d check your logs before replacing that disk though, you need to see
if it''s just that one disk, or if others are affected.  Provided you
have a redundant ZFS pool, it may be worth offlining that disk, unconfiguring it
with cfgadm, and then pulling the drive to see if that does cure the warnings
you''re getting in the logs.

Whatever you do, please keep me posted.  Your post has already made me realise
it would be a good idea to have a script watching log file sizes to catch
problems like this early.

Ross
 
 
This message posted from opensolaris.org

Matt Harrison

2008-Aug-03 15:48 UTC

head link

[zfs-discuss] are these errors dangerous

Ross wrote:> Hi,
> 
> First of all, I really should warn you that I''m very new to
Solaris, I''ll happily share my thoughts but be aware that
there''s not a lot of experience backing them up.
> 
>>From what you''ve said, and the logs you''ve posted I
suspect you''re hitting recoverable read errors.  ZFS wouldn''t
flag these as no corrupt data has been encountered, but I suspect the device
driver is logging them anyway.
> 
> The log you posted all appears to refer to one disk (sd0), my guess would
be that you have some hardware faults on that device and if it were me
I''d probably be replacing it before it actually fails.
> 
> I''d check your logs before replacing that disk though, you need to
see if it''s just that one disk, or if others are affected.  Provided
you have a redundant ZFS pool, it may be worth offlining that disk,
unconfiguring it with cfgadm, and then pulling the drive to see if that does
cure the warnings you''re getting in the logs.
> 
> Whatever you do, please keep me posted.  Your post has already made me
realise it would be a good idea to have a script watching log file sizes to
catch problems like this early.
> 
> Ross
Thanks for your insights, I''m also relatively new to solaris but
i''ve
been on linux for years. I''ve just read more into the logs and its 
giving these errors for all 3 of my disks (sd0,1,2). I''m running a 
raidz1, unfortunately without any spares and I''m not too keen on 
removing the parity from my pool as I''ve got a lot of important files 
stored there.

I would agree that this seems to be a recoverable error and nothing is 
getting corrupted thanks to ZFS. The thing I''m worried about is if the 
entire batch is failing slowly and will all die at the same time.

Hopefully some ZFS/hardware guru can comment on this before the world 
ends for me :P

Thanks

Matt

No virus found in this outgoing message.
Checked by AVG - http://www.avg.com 
Version: 8.0.138 / Virus Database: 270.5.10/1587 - Release Date: 02/08/2008
17:30

Miles Nordin

2008-Aug-03 16:40 UTC

head link

[zfs-discuss] are these errors dangerous

>>>>> "mh" == Matt Harrison <iwasinnamuknow at
genestate.com> writes:
    mh>  I''m worried about is if the entire batch is failing slowly
    mh> and will all die at the same time.

If you can download smartctl, you can use the approach described here:

 http://web.Ivy.NET/~carton/rant/ml/raid-findingBadDisks-0.html
 http://web.Ivy.NET/~carton/rant/ml/raid-findingBadDisks-1.html

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080803/90e3b1a5/attachment.bin>

Ross Smith

2008-Aug-03 17:24 UTC

head link

[zfs-discuss] are these errors dangerous

Hi Matt,
 
If it''s all 3 disks, I wouldn''t have thought it likely to be
disk errors, and I don''t think it''s a ZFS fault as such.  You
might be better posting the question in the storage or help forums to see if
anybody there can shed more light on this.
 
Ross> Date: Sun, 3 Aug 2008 16:48:03 +0100> From: iwasinnamuknow at
genestate.com> To: myxiplx at hotmail.com> CC: zfs-discuss at
opensolaris.org> Subject: Re: [zfs-discuss] are these errors dangerous>
> Ross wrote:> > Hi,> > > > First of all, I really should
warn you that I''m very new to Solaris, I''ll happily share my
thoughts but be aware that there''s not a lot of experience backing them
up.> > > >>From what you''ve said, and the logs
you''ve posted I suspect you''re hitting recoverable read
errors. ZFS wouldn''t flag these as no corrupt data has been
encountered, but I suspect the device driver is logging them anyway.> >
> > The log you posted all appears to refer to one disk (sd0), my guess
would be that you have some hardware faults on that device and if it were me
I''d probably be replacing it before it actually fails.> > >
> I''d check your logs before replacing that disk though, you need to
see if it''s just that one disk, or if others are affected. Provided you
have a redundant ZFS pool, it may be worth offlining that disk, unconfiguring it
with cfgadm, and then pulling the drive to see if that does cure the warnings
you''re getting in the logs.> > > > Whatever you do, please
keep me posted. Your post has already made me realise it would be a good idea to
have a script watching log file sizes to catch problems like this early.>
> > > Ross> > Thanks for your insights, I''m also
relatively new to solaris but i''ve > been on linux for years.
I''ve just read more into the logs and its > giving these errors for
all 3 of my disks (sd0,1,2). I''m running a > raidz1, unfortunately
without any spares and I''m not too keen on > removing the parity
from my pool as I''ve got a lot of important files > stored
there.> > I would agree that this seems to be a recoverable error and
nothing is > getting corrupted thanks to ZFS. The thing I''m worried
about is if the > entire batch is failing slowly and will all die at the same
time.> > Hopefully some ZFS/hardware guru can comment on this before the
world > ends for me :P> > Thanks> > Matt> > No virus found
in this outgoing message.> Checked by AVG - http://www.avg.com > Version:
8.0.138 / Virus Database: 270.5.10/1587 - Release Date: 02/08/2008 17:30>
>_________________________________________________________________
Win a voice over part with Kung Fu Panda & Live Search?? and?? 100?s of Kung
Fu Panda prizes to win with Live Search
http://clk.atdmt.com/UKM/go/107571439/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080803/86d2b507/attachment.html>

Matt Harrison

2008-Aug-03 18:48 UTC

head link

[zfs-discuss] are these errors dangerous

Miles Nordin wrote:>>>>>> "mh" == Matt Harrison <iwasinnamuknow at
genestate.com> writes:
> 
>     mh>  I''m worried about is if the entire batch is failing
slowly
>     mh> and will all die at the same time.
> 
> If you can download smartctl, you can use the approach described here:
> 
>  http://web.Ivy.NET/~carton/rant/ml/raid-findingBadDisks-0.html
>  http://web.Ivy.NET/~carton/rant/ml/raid-findingBadDisks-1.html
I already had smartmontools for temp monitoring. using smartctl -a I get :

Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     33 C

Error Counter logging not supported  <---- unhelpful
No self-tests have been logged

So it looks like I can''t use the error count on these (sata) drives. 
Otherwise everything else looks ok for all 3.

And regard Ross'' reply, I will try posting something to storage-discuss
and see if anyone has more ideas.

thanks

Matt

No virus found in this outgoing message.
Checked by AVG - http://www.avg.com 
Version: 8.0.138 / Virus Database: 270.5.10/1587 - Release Date: 02/08/2008
17:30

Johan Hartzenberg

2008-Aug-03 19:07 UTC

head link

[zfs-discuss] are these errors dangerous

On Sun, Aug 3, 2008 at 8:48 PM, Matt Harrison
<iwasinnamuknow at genestate.com>wrote:
> Miles Nordin wrote:
> >>>>>> "mh" == Matt Harrison <iwasinnamuknow
at genestate.com> writes:
> >
> >     mh>  I''m worried about is if the entire batch is
failing slowly
> >     mh> and will all die at the same time.
> >
>

Matt, can you please post the output from this command:

iostat -E

This will show counts of the types of errors for all disks since the last
reboot.  I am guessing sd0 is your CD / DVD drive.

Thank you,
  _Johan


-- 
Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

Afrikaanse Stap Website: http://www.bloukous.co.za

My blog: http://initialprogramload.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080803/a0e9b445/attachment.html>

Richard Elling

2008-Aug-03 19:52 UTC

head link

[zfs-discuss] are these errors dangerous

Matt Harrison wrote:> Hi everyone,
>
> I''ve been running a zfs fileserver for about a month now (on
snv_91) and
> it''s all working really well. I''m scrubbing once a week
and nothing has
> come up as a problem yet.
>
> I''m a little worried as I''ve just noticed these messages
in
> /var/adm/message and I don''t know if they''re bad or just
informational:
>
> Aug  2 14:46:06 exodus  Error for Command: read_defect_data        Error 
> Level: Informational
>   
key here: "Informational"
> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Requested Block: 
> 0                         Error Block: 0
> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Vendor: ATA 
>                             Serial Number:
> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Sense Key: 
> Illegal_Request
> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    ASC: 0x20 
> (invalid command operation code), ASCQ: 0x0, FRU: 0x0
>   
Key here: "ASC 0x20 (invalid command operation code)"
> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.warning] WARNING: 
> /pci at 0,0/pci1043,8239 at 5/disk at 0,0 (sd0):
> Aug  2 14:46:06 exodus  Error for Command: log_sense               Error 
> Level: Informational
> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Requested Block: 
> 0                         Error Block: 0
> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Vendor: ATA 
>                             Serial Number:
> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Sense Key: 
> Illegal_Request
> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    ASC: 0x24 
> (invalid field in cdb), ASCQ: 0x0, FRU: 0x0
>   
Key here: "invalid field in cbd" where CDB is command data block
http://en.wikipedia.org/wiki/SCSI_CDB

Obviously a command is being sent to the device that it doesn''t
understand.  This could be a host side driver or disk firmware problem.
I''d classify this as annoying, but doesn''t appear dangerous on
the face.
With some digging you could determine which command is failing,
but that won''t fix anything.  You might check with the disk vendor
for firmware upgrades and you might look at a later version of the
OS drivers.

This isn''t a ZFS issue, so you might have better luck on the
storage-discuss
forum.
 -- richard

Matt Harrison

2008-Aug-03 19:59 UTC

head link

[zfs-discuss] are these errors dangerous

Johan Hartzenberg wrote:> On Sun, Aug 3, 2008 at 8:48 PM, Matt Harrison
> <iwasinnamuknow at genestate.com>wrote:
> 
>> Miles Nordin wrote:
>>>>>>>> "mh" == Matt Harrison
<iwasinnamuknow at genestate.com> writes:
>>>     mh>  I''m worried about is if the entire batch is
failing slowly
>>>     mh> and will all die at the same time.
>>>
> 
> 
> Matt, can you please post the output from this command:
> 
> iostat -E
root at exodus:~ # iostat -E
cmdk0     Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: WDC WD2000JB-00 Revision:  Serial No: WD-WCAL81632817 Size: 
200.05GB <200047067136 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
sd0       Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: WDC WD7500AAKS-0 Revision: 4G30 Serial No:
Size: 750.16GB <750156374016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 478675 Predictive Failure Analysis: 0
sd1       Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: WDC WD7500AAKS-0 Revision: 4G30 Serial No:
Size: 750.16GB <750156374016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 478626 Predictive Failure Analysis: 0
sd2       Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: WDC WD7500AAKS-0 Revision: 4G30 Serial No:
Size: 750.16GB <750156374016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 478604 Predictive Failure Analysis: 0
sd3       Soft Errors: 0 Hard Errors: 16 Transport Errors: 0
Vendor: HL-DT-ST Product: DVDRAM_GSA-H10N  Revision: JX06 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 16 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

Lots of illegal requests, and a few hard errors. Doesn''t look good.
> This will show counts of the types of errors for all disks since the last
> reboot.  I am guessing sd0 is your CD / DVD drive.
I don''t think so, my dvd drive is on ide along with the boot drive, 
while my pool is on 3 SATA disks.

Thanks

Matt


No virus found in this outgoing message.
Checked by AVG - http://www.avg.com 
Version: 8.0.138 / Virus Database: 270.5.10/1587 - Release Date: 02/08/2008
17:30

Matt Harrison

2008-Aug-03 20:03 UTC

head link

[zfs-discuss] are these errors dangerous

Richard Elling wrote:> Matt Harrison wrote:
>> Aug  2 14:46:06 exodus  Error for Command: read_defect_data        
>> Error Level: Informational
>>   
> 
> key here: "Informational"
> 
>> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Requested 
>> Block: 0                         Error Block: 0
>> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Vendor: ATA 
>>                             Serial Number:
>> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Sense Key: 
>> Illegal_Request
>> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    ASC: 0x20 
>> (invalid command operation code), ASCQ: 0x0, FRU: 0x0
>>   
> 
> Key here: "ASC 0x20 (invalid command operation code)"
> 
>> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.warning] WARNING: 
>> /pci at 0,0/pci1043,8239 at 5/disk at 0,0 (sd0):
>> Aug  2 14:46:06 exodus  Error for Command: log_sense               
>> Error Level: Informational
>> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Requested 
>> Block: 0                         Error Block: 0
>> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Vendor: ATA 
>>                             Serial Number:
>> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    Sense Key: 
>> Illegal_Request
>> Aug  2 14:46:06 exodus scsi: [ID 107833 kern.notice]    ASC: 0x24 
>> (invalid field in cdb), ASCQ: 0x0, FRU: 0x0
>>   
> 
> Key here: "invalid field in cbd" where CDB is command data block
> http://en.wikipedia.org/wiki/SCSI_CDB
> 
> Obviously a command is being sent to the device that it doesn''t
> understand.  This could be a host side driver or disk firmware problem.
> I''d classify this as annoying, but doesn''t appear
dangerous on the face.
> With some digging you could determine which command is failing,
> but that won''t fix anything.  You might check with the disk vendor
> for firmware upgrades and you might look at a later version of the
> OS drivers.
Well I''m pleased it doesn''t scream DANGER to people. I can
live with
clearing out the logs now and then. I will check with WD if there are 
firmware updates for these disks, and I will update my snv at some point.
> This isn''t a ZFS issue, so you might have better luck on the 
> storage-discuss
I have posted to storage-discuss a little while ago. I''m not even sure 
why I posted here in the first place, storage-discuss would be a much 
better idea.

Thanks

Matt

No virus found in this outgoing message.
Checked by AVG - http://www.avg.com 
Version: 8.0.138 / Virus Database: 270.5.10/1587 - Release Date: 02/08/2008
17:30

Gary Mitchell

2010-Jun-08 21:05 UTC

head link

[zfs-discuss] are these errors dangerous

I have seen this too

I ''m guessing you have SATA disks which are on a iSCSI target.
I''m also guessing you have used something like

iscsitadm create target --type raw -b /dev/dsk/c4t0d00 c4t0d0

ie you are not using a zfs shareiscsi property on a zfs volume but creating  the
target from  the device
cNtNdN (dsk or rdsk it doesn''t seem to matter)




You see these errors (always block 0) when the iSCSI initiator accesses the
disks

annoying ... but the iSCSI transactions seem to be OK.
-- 
This message posted from opensolaris.org

Reasonably Related Threads

Search for more reasonably related threads

zfs discuss - Aug 2008 - are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

[zfs-discuss] are these errors dangerous

Reasonably Related Threads