thr3ads.net - zfs discuss - [zfs-discuss] Possible corruption after disk hiccups... [May 2006]

If this information is useful, please help other people find it:
Share via:

Alan Romeril

2006-May-09 19:55 UTC

[zfs-discuss] Possible corruption after disk hiccups...

I''m not sure exactly what happened with my box here, but something
caused a hiccup on multiple sata disks...

May  9 16:40:33 sol scsi: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci10de,5c at 9/pci-ide at a/ide at 0 (ata6):
May  9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1 (ata3):
May  9 16:47:43 sol     timeout: abort request, target=0 lun=0
May  9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1 (ata3):
May  9 16:40:33 sol scsi: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci10de,5c at 9/pci-ide at a/ide at 0 (ata6):
May  9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1 (ata3):
May  9 16:47:43 sol     timeout: abort request, target=0 lun=0
May  9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1 (ata3):
May  9 16:47:43 sol     timeout: abort device, target=0 lun=0
May  9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1 (ata3):
May  9 16:47:43 sol     timeout: reset target, target=0 lun=0
May  9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1 (ata3):
May  9 16:47:43 sol     timeout: reset bus, target=0 lun=0
May  9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 0/cmdk at 0,0 (Disk1):
May  9 16:47:43 sol     Error for command ''write sector''      
Error Level: Informational
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Sense Key: aborted
command
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Vendor ''Gen-ATA
'' error code: 0x3
May  9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 0/cmdk at 0,0 (Disk1):
May  9 16:47:43 sol     Error for command ''read sector'' Error
Level: Informational
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Sense Key: aborted
command
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Vendor ''Gen-ATA
'' error code: 0x3
May  9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci10de,5c at 9/pci-ide at a/ide at 0/cmdk at 1,0 (Disk6):
May  9 16:47:43 sol     Error for command ''write sector''      
Error Level: Informational
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Sense Key: aborted
command
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Vendor ''Gen-ATA
'' error code: 0x3
May  9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci10de,5c at 9/pci-ide at a/ide at 0/cmdk at 0,0 (Disk5):
May  9 16:47:43 sol     Error for command ''write sector''      
Error Level: Informational
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Sense Key: aborted
command
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Vendor ''Gen-ATA
'' error code: 0x3
May  9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 8/ide at 1/cmdk at 0,0 (Disk4):
May  9 16:47:43 sol     Error for command ''write sector''      
Error Level: Informational
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Sense Key: aborted
command
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Vendor ''Gen-ATA
'' error code: 0x3
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Sense Key: aborted
command
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Vendor ''Gen-ATA
'' error code: 0x3
May  9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 8/ide at 1/cmdk at 0,0 (Disk4):
May  9 16:47:43 sol     Error for command ''write sector''      
Error Level: Informational
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Sense Key: aborted
command
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Vendor ''Gen-ATA
'' error code: 0x3
May  9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1/cmdk at 0,0 (Disk2):
May  9 16:47:43 sol     Error for command ''write sector''      
Error Level: Informational
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Sense Key: aborted
command
May  9 16:47:43 sol gda: [ID 107833 kern.notice]        Vendor ''Gen-ATA
'' error code: 0x3
May  9 16:47:43 sol unix: [ID 836849 kern.notice]
May  9 16:47:43 sol ^Mpanic[cpu0]/thread=fffffe8000581c80:
May  9 16:47:43 sol genunix: [ID 809409 kern.notice] ZFS: I/O failure (write on
<unknown> off 0: zio fffffe81a5972340 [L0 ZIL intent log]
 2000L/2000P DVA[0]=<0:25786c7000:2800> zilog uncompressed LE contiguous
birth=1468445 fill=0 cksum=4392a2279563047e:1b7716cbbf370c72:ac:
6b): error 5
May  9 16:47:43 sol unix: [ID 100000 kern.notice]
May  9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581a00
zfs:zio_done+2fc ()
May  9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581a30
zfs:zio_next_stage+11e ()
May  9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581a80
zfs:zio_wait_for_children+5e ()
May  9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581aa0
zfs:zio_wait_children_done+22 ()
May  9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581ad0
zfs:zio_next_stage+11e ()
May  9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581b20
zfs:zio_vdev_io_assess+15b ()
May  9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581b50
zfs:zio_next_stage+11e ()
May  9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581ba0
zfs:vdev_mirror_io_done+38c ()
May  9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581bc0
zfs:zio_vdev_io_done+2d ()
May  9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581c60
genunix:taskq_thread+200 ()
May  9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581c70
unix:thread_start+8 ()
May  9 16:47:43 sol unix: [ID 100000 kern.notice]
May  9 16:47:43 sol genunix: [ID 672855 kern.notice] syncing file systems...
May  9 16:47:43 sol genunix: [ID 904073 kern.notice]  done
May  9 16:47:44 sol genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c0d1s1,
offset 1718419456, content: kernel
May  9 16:49:08 sol genunix: [ID 409368 kern.notice] ^M100% done: 840237 pages
dumped, compression ratio 2.34,
May  9 16:49:08 sol genunix: [ID 851671 kern.notice] dump succeeded
May  9 19:33:54 sol genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11
Version 20060424 64-bit

bash-3.00# zpool status -v
  pool: raidpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress, 56.98% done, 0h5m to go
config:

        NAME        STATE     READ WRITE CKSUM
        raidpool    ONLINE       0     0     0
          raidz     ONLINE       0     0     0
            c2d0    ONLINE       0     0     0
            c3d0    ONLINE       0     0     0
            c4d0    ONLINE       0     0     0
            c5d0    ONLINE       0     0     0
            c6d0    ONLINE       0     0     0
            c6d1    ONLINE       0     0     0
            c7d0    ONLINE       0     0     0
            c7d1    ONLINE       0     0     0

errors: The following persistent errors have been detected:

          DATASET  OBJECT  RANGE
          ac       0       lvl=0 blkid=18584
          ac       0       lvl=0 blkid=18585
          ac       0       lvl=0 blkid=18586
          ac       0       lvl=0 blkid=18587
          ac       0       lvl=0 blkid=18588
          ac       913a9   lvl=0 blkid=0


I''ve set off a scrub to check things, there was no resilver of any data
on boot, but there''s mention of corruption...  Is there any way of
translating this output to filenames?  As this is a zfs root, I''d like
to be absolutely sure before doing too much with this machine.

Cheers,
Alan
 
 
This message posted from opensolaris.org

Alan Romeril

2006-May-09 20:14 UTC

head link

[zfs-discuss] Re: Possible corruption after disk hiccups...

Eh maybe it''s not a problem after all, the scrub has completed well...

--a

bash-3.00# zpool status -v
  pool: raidpool
 state: ONLINE
 scrub: scrub completed with 0 errors on Tue May  9 21:10:55 2006
config:

        NAME        STATE     READ WRITE CKSUM
        raidpool    ONLINE       0     0     0
          raidz     ONLINE       0     0     0
            c2d0    ONLINE       0     0     0
            c3d0    ONLINE       0     0     0
            c4d0    ONLINE       0     0     0
            c5d0    ONLINE       0     0     0
            c6d0    ONLINE       0     0     0
            c6d1    ONLINE       0     0     0
            c7d0    ONLINE       0     0     0
            c7d1    ONLINE       0     0     0

errors: No known data errors
 
 
This message posted from opensolaris.org

Eric Schrock

2006-May-09 21:26 UTC

head link

[zfs-discuss] Re: Possible corruption after disk hiccups...

Yes.  What happened is that you had a transient error which resulted in
EIO being returned to the application.  We dutifully recorded this fact
in the persistent error log.  When you ran a scrub, it verified that the
blocks were in fact still readable, and hence removed them from the
error log.  Methinks the recommended action should request a scrub
first.  However, it''s bizarre that your drives all showed zero errors.
Are you ruynning build 36 or later?  Can you send me the contents of
/var/fm/fmd/{err,flt}log and /var/adm/messages?

Thanks,

- Eric

On Tue, May 09, 2006 at 01:14:31PM -0700, Alan Romeril
wrote:> Eh maybe it''s not a problem after all, the scrub has completed
well...
> 
> --a
> 
> bash-3.00# zpool status -v
>   pool: raidpool
>  state: ONLINE
>  scrub: scrub completed with 0 errors on Tue May  9 21:10:55 2006
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         raidpool    ONLINE       0     0     0
>           raidz     ONLINE       0     0     0
>             c2d0    ONLINE       0     0     0
>             c3d0    ONLINE       0     0     0
>             c4d0    ONLINE       0     0     0
>             c5d0    ONLINE       0     0     0
>             c6d0    ONLINE       0     0     0
>             c6d1    ONLINE       0     0     0
>             c7d0    ONLINE       0     0     0
>             c7d1    ONLINE       0     0     0
> 
> errors: No known data errors
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Eric Schrock

2006-May-09 21:27 UTC

head link

[zfs-discuss] Possible corruption after disk hiccups...

On Tue, May 09, 2006 at 12:55:34PM -0700, Alan Romeril
wrote:> 
> I''ve set off a scrub to check things, there was no resilver of any
> data on boot, but there''s mention of corruption...  Is there any
way
> of translating this output to filenames?  As this is a zfs root,
I''d
> like to be absolutely sure before doing too much with this machine.
There''s an open RFE to display these as filenames:

6410433 ''zpool status -v'' would be more useful with filenames

But it''s non-trivial.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Reasonably Related Threads

Search for more maybe matching threads

zfs discuss - May 2006 - Possible corruption after disk hiccups...

[zfs-discuss] Possible corruption after disk hiccups...

[zfs-discuss] Re: Possible corruption after disk hiccups...

[zfs-discuss] Re: Possible corruption after disk hiccups...

[zfs-discuss] Possible corruption after disk hiccups...

Reasonably Related Threads