Alan Romeril
2006-May-09 19:55 UTC
[zfs-discuss] Possible corruption after disk hiccups...
I''m not sure exactly what happened with my box here, but something
caused a hiccup on multiple sata disks...
May 9 16:40:33 sol scsi: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci10de,5c at 9/pci-ide at a/ide at 0 (ata6):
May 9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1 (ata3):
May 9 16:47:43 sol timeout: abort request, target=0 lun=0
May 9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1 (ata3):
May 9 16:40:33 sol scsi: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci10de,5c at 9/pci-ide at a/ide at 0 (ata6):
May 9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1 (ata3):
May 9 16:47:43 sol timeout: abort request, target=0 lun=0
May 9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1 (ata3):
May 9 16:47:43 sol timeout: abort device, target=0 lun=0
May 9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1 (ata3):
May 9 16:47:43 sol timeout: reset target, target=0 lun=0
May 9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1 (ata3):
May 9 16:47:43 sol timeout: reset bus, target=0 lun=0
May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 0/cmdk at 0,0 (Disk1):
May 9 16:47:43 sol Error for command ''write sector''
Error Level: Informational
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted
command
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA
'' error code: 0x3
May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 0/cmdk at 0,0 (Disk1):
May 9 16:47:43 sol Error for command ''read sector'' Error
Level: Informational
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted
command
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA
'' error code: 0x3
May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci10de,5c at 9/pci-ide at a/ide at 0/cmdk at 1,0 (Disk6):
May 9 16:47:43 sol Error for command ''write sector''
Error Level: Informational
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted
command
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA
'' error code: 0x3
May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci10de,5c at 9/pci-ide at a/ide at 0/cmdk at 0,0 (Disk5):
May 9 16:47:43 sol Error for command ''write sector''
Error Level: Informational
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted
command
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA
'' error code: 0x3
May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 8/ide at 1/cmdk at 0,0 (Disk4):
May 9 16:47:43 sol Error for command ''write sector''
Error Level: Informational
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted
command
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA
'' error code: 0x3
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted
command
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA
'' error code: 0x3
May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 8/ide at 1/cmdk at 0,0 (Disk4):
May 9 16:47:43 sol Error for command ''write sector''
Error Level: Informational
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted
command
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA
'' error code: 0x3
May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide
at 7/ide at 1/cmdk at 0,0 (Disk2):
May 9 16:47:43 sol Error for command ''write sector''
Error Level: Informational
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted
command
May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA
'' error code: 0x3
May 9 16:47:43 sol unix: [ID 836849 kern.notice]
May 9 16:47:43 sol ^Mpanic[cpu0]/thread=fffffe8000581c80:
May 9 16:47:43 sol genunix: [ID 809409 kern.notice] ZFS: I/O failure (write on
<unknown> off 0: zio fffffe81a5972340 [L0 ZIL intent log]
2000L/2000P DVA[0]=<0:25786c7000:2800> zilog uncompressed LE contiguous
birth=1468445 fill=0 cksum=4392a2279563047e:1b7716cbbf370c72:ac:
6b): error 5
May 9 16:47:43 sol unix: [ID 100000 kern.notice]
May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581a00
zfs:zio_done+2fc ()
May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581a30
zfs:zio_next_stage+11e ()
May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581a80
zfs:zio_wait_for_children+5e ()
May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581aa0
zfs:zio_wait_children_done+22 ()
May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581ad0
zfs:zio_next_stage+11e ()
May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581b20
zfs:zio_vdev_io_assess+15b ()
May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581b50
zfs:zio_next_stage+11e ()
May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581ba0
zfs:vdev_mirror_io_done+38c ()
May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581bc0
zfs:zio_vdev_io_done+2d ()
May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581c60
genunix:taskq_thread+200 ()
May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581c70
unix:thread_start+8 ()
May 9 16:47:43 sol unix: [ID 100000 kern.notice]
May 9 16:47:43 sol genunix: [ID 672855 kern.notice] syncing file systems...
May 9 16:47:43 sol genunix: [ID 904073 kern.notice] done
May 9 16:47:44 sol genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c0d1s1,
offset 1718419456, content: kernel
May 9 16:49:08 sol genunix: [ID 409368 kern.notice] ^M100% done: 840237 pages
dumped, compression ratio 2.34,
May 9 16:49:08 sol genunix: [ID 851671 kern.notice] dump succeeded
May 9 19:33:54 sol genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11
Version 20060424 64-bit
bash-3.00# zpool status -v
pool: raidpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub in progress, 56.98% done, 0h5m to go
config:
NAME STATE READ WRITE CKSUM
raidpool ONLINE 0 0 0
raidz ONLINE 0 0 0
c2d0 ONLINE 0 0 0
c3d0 ONLINE 0 0 0
c4d0 ONLINE 0 0 0
c5d0 ONLINE 0 0 0
c6d0 ONLINE 0 0 0
c6d1 ONLINE 0 0 0
c7d0 ONLINE 0 0 0
c7d1 ONLINE 0 0 0
errors: The following persistent errors have been detected:
DATASET OBJECT RANGE
ac 0 lvl=0 blkid=18584
ac 0 lvl=0 blkid=18585
ac 0 lvl=0 blkid=18586
ac 0 lvl=0 blkid=18587
ac 0 lvl=0 blkid=18588
ac 913a9 lvl=0 blkid=0
I''ve set off a scrub to check things, there was no resilver of any data
on boot, but there''s mention of corruption... Is there any way of
translating this output to filenames? As this is a zfs root, I''d like
to be absolutely sure before doing too much with this machine.
Cheers,
Alan
This message posted from opensolaris.org
Alan Romeril
2006-May-09 20:14 UTC
[zfs-discuss] Re: Possible corruption after disk hiccups...
Eh maybe it''s not a problem after all, the scrub has completed well...
--a
bash-3.00# zpool status -v
pool: raidpool
state: ONLINE
scrub: scrub completed with 0 errors on Tue May 9 21:10:55 2006
config:
NAME STATE READ WRITE CKSUM
raidpool ONLINE 0 0 0
raidz ONLINE 0 0 0
c2d0 ONLINE 0 0 0
c3d0 ONLINE 0 0 0
c4d0 ONLINE 0 0 0
c5d0 ONLINE 0 0 0
c6d0 ONLINE 0 0 0
c6d1 ONLINE 0 0 0
c7d0 ONLINE 0 0 0
c7d1 ONLINE 0 0 0
errors: No known data errors
This message posted from opensolaris.org
Eric Schrock
2006-May-09 21:26 UTC
[zfs-discuss] Re: Possible corruption after disk hiccups...
Yes. What happened is that you had a transient error which resulted in
EIO being returned to the application. We dutifully recorded this fact
in the persistent error log. When you ran a scrub, it verified that the
blocks were in fact still readable, and hence removed them from the
error log. Methinks the recommended action should request a scrub
first. However, it''s bizarre that your drives all showed zero errors.
Are you ruynning build 36 or later? Can you send me the contents of
/var/fm/fmd/{err,flt}log and /var/adm/messages?
Thanks,
- Eric
On Tue, May 09, 2006 at 01:14:31PM -0700, Alan Romeril
wrote:> Eh maybe it''s not a problem after all, the scrub has completed
well...
>
> --a
>
> bash-3.00# zpool status -v
> pool: raidpool
> state: ONLINE
> scrub: scrub completed with 0 errors on Tue May 9 21:10:55 2006
> config:
>
> NAME STATE READ WRITE CKSUM
> raidpool ONLINE 0 0 0
> raidz ONLINE 0 0 0
> c2d0 ONLINE 0 0 0
> c3d0 ONLINE 0 0 0
> c4d0 ONLINE 0 0 0
> c5d0 ONLINE 0 0 0
> c6d0 ONLINE 0 0 0
> c6d1 ONLINE 0 0 0
> c7d0 ONLINE 0 0 0
> c7d1 ONLINE 0 0 0
>
> errors: No known data errors
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Eric Schrock
2006-May-09 21:27 UTC
[zfs-discuss] Possible corruption after disk hiccups...
On Tue, May 09, 2006 at 12:55:34PM -0700, Alan Romeril wrote:> > I''ve set off a scrub to check things, there was no resilver of any > data on boot, but there''s mention of corruption... Is there any way > of translating this output to filenames? As this is a zfs root, I''d > like to be absolutely sure before doing too much with this machine.There''s an open RFE to display these as filenames: 6410433 ''zpool status -v'' would be more useful with filenames But it''s non-trivial. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock