Alan Romeril
2006-May-09 19:55 UTC
[zfs-discuss] Possible corruption after disk hiccups...
I''m not sure exactly what happened with my box here, but something caused a hiccup on multiple sata disks... May 9 16:40:33 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci10de,5c at 9/pci-ide at a/ide at 0 (ata6): May 9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 7/ide at 1 (ata3): May 9 16:47:43 sol timeout: abort request, target=0 lun=0 May 9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 7/ide at 1 (ata3): May 9 16:40:33 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci10de,5c at 9/pci-ide at a/ide at 0 (ata6): May 9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 7/ide at 1 (ata3): May 9 16:47:43 sol timeout: abort request, target=0 lun=0 May 9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 7/ide at 1 (ata3): May 9 16:47:43 sol timeout: abort device, target=0 lun=0 May 9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 7/ide at 1 (ata3): May 9 16:47:43 sol timeout: reset target, target=0 lun=0 May 9 16:47:43 sol scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 7/ide at 1 (ata3): May 9 16:47:43 sol timeout: reset bus, target=0 lun=0 May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 7/ide at 0/cmdk at 0,0 (Disk1): May 9 16:47:43 sol Error for command ''write sector'' Error Level: Informational May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted command May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA '' error code: 0x3 May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 7/ide at 0/cmdk at 0,0 (Disk1): May 9 16:47:43 sol Error for command ''read sector'' Error Level: Informational May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted command May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA '' error code: 0x3 May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci10de,5c at 9/pci-ide at a/ide at 0/cmdk at 1,0 (Disk6): May 9 16:47:43 sol Error for command ''write sector'' Error Level: Informational May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted command May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA '' error code: 0x3 May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci10de,5c at 9/pci-ide at a/ide at 0/cmdk at 0,0 (Disk5): May 9 16:47:43 sol Error for command ''write sector'' Error Level: Informational May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted command May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA '' error code: 0x3 May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 8/ide at 1/cmdk at 0,0 (Disk4): May 9 16:47:43 sol Error for command ''write sector'' Error Level: Informational May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted command May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA '' error code: 0x3 May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted command May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA '' error code: 0x3 May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 8/ide at 1/cmdk at 0,0 (Disk4): May 9 16:47:43 sol Error for command ''write sector'' Error Level: Informational May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted command May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA '' error code: 0x3 May 9 16:47:43 sol gda: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci-ide at 7/ide at 1/cmdk at 0,0 (Disk2): May 9 16:47:43 sol Error for command ''write sector'' Error Level: Informational May 9 16:47:43 sol gda: [ID 107833 kern.notice] Sense Key: aborted command May 9 16:47:43 sol gda: [ID 107833 kern.notice] Vendor ''Gen-ATA '' error code: 0x3 May 9 16:47:43 sol unix: [ID 836849 kern.notice] May 9 16:47:43 sol ^Mpanic[cpu0]/thread=fffffe8000581c80: May 9 16:47:43 sol genunix: [ID 809409 kern.notice] ZFS: I/O failure (write on <unknown> off 0: zio fffffe81a5972340 [L0 ZIL intent log] 2000L/2000P DVA[0]=<0:25786c7000:2800> zilog uncompressed LE contiguous birth=1468445 fill=0 cksum=4392a2279563047e:1b7716cbbf370c72:ac: 6b): error 5 May 9 16:47:43 sol unix: [ID 100000 kern.notice] May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581a00 zfs:zio_done+2fc () May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581a30 zfs:zio_next_stage+11e () May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581a80 zfs:zio_wait_for_children+5e () May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581aa0 zfs:zio_wait_children_done+22 () May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581ad0 zfs:zio_next_stage+11e () May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581b20 zfs:zio_vdev_io_assess+15b () May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581b50 zfs:zio_next_stage+11e () May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581ba0 zfs:vdev_mirror_io_done+38c () May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581bc0 zfs:zio_vdev_io_done+2d () May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581c60 genunix:taskq_thread+200 () May 9 16:47:43 sol genunix: [ID 655072 kern.notice] fffffe8000581c70 unix:thread_start+8 () May 9 16:47:43 sol unix: [ID 100000 kern.notice] May 9 16:47:43 sol genunix: [ID 672855 kern.notice] syncing file systems... May 9 16:47:43 sol genunix: [ID 904073 kern.notice] done May 9 16:47:44 sol genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c0d1s1, offset 1718419456, content: kernel May 9 16:49:08 sol genunix: [ID 409368 kern.notice] ^M100% done: 840237 pages dumped, compression ratio 2.34, May 9 16:49:08 sol genunix: [ID 851671 kern.notice] dump succeeded May 9 19:33:54 sol genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 Version 20060424 64-bit bash-3.00# zpool status -v pool: raidpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress, 56.98% done, 0h5m to go config: NAME STATE READ WRITE CKSUM raidpool ONLINE 0 0 0 raidz ONLINE 0 0 0 c2d0 ONLINE 0 0 0 c3d0 ONLINE 0 0 0 c4d0 ONLINE 0 0 0 c5d0 ONLINE 0 0 0 c6d0 ONLINE 0 0 0 c6d1 ONLINE 0 0 0 c7d0 ONLINE 0 0 0 c7d1 ONLINE 0 0 0 errors: The following persistent errors have been detected: DATASET OBJECT RANGE ac 0 lvl=0 blkid=18584 ac 0 lvl=0 blkid=18585 ac 0 lvl=0 blkid=18586 ac 0 lvl=0 blkid=18587 ac 0 lvl=0 blkid=18588 ac 913a9 lvl=0 blkid=0 I''ve set off a scrub to check things, there was no resilver of any data on boot, but there''s mention of corruption... Is there any way of translating this output to filenames? As this is a zfs root, I''d like to be absolutely sure before doing too much with this machine. Cheers, Alan This message posted from opensolaris.org
Alan Romeril
2006-May-09 20:14 UTC
[zfs-discuss] Re: Possible corruption after disk hiccups...
Eh maybe it''s not a problem after all, the scrub has completed well... --a bash-3.00# zpool status -v pool: raidpool state: ONLINE scrub: scrub completed with 0 errors on Tue May 9 21:10:55 2006 config: NAME STATE READ WRITE CKSUM raidpool ONLINE 0 0 0 raidz ONLINE 0 0 0 c2d0 ONLINE 0 0 0 c3d0 ONLINE 0 0 0 c4d0 ONLINE 0 0 0 c5d0 ONLINE 0 0 0 c6d0 ONLINE 0 0 0 c6d1 ONLINE 0 0 0 c7d0 ONLINE 0 0 0 c7d1 ONLINE 0 0 0 errors: No known data errors This message posted from opensolaris.org
Eric Schrock
2006-May-09 21:26 UTC
[zfs-discuss] Re: Possible corruption after disk hiccups...
Yes. What happened is that you had a transient error which resulted in EIO being returned to the application. We dutifully recorded this fact in the persistent error log. When you ran a scrub, it verified that the blocks were in fact still readable, and hence removed them from the error log. Methinks the recommended action should request a scrub first. However, it''s bizarre that your drives all showed zero errors. Are you ruynning build 36 or later? Can you send me the contents of /var/fm/fmd/{err,flt}log and /var/adm/messages? Thanks, - Eric On Tue, May 09, 2006 at 01:14:31PM -0700, Alan Romeril wrote:> Eh maybe it''s not a problem after all, the scrub has completed well... > > --a > > bash-3.00# zpool status -v > pool: raidpool > state: ONLINE > scrub: scrub completed with 0 errors on Tue May 9 21:10:55 2006 > config: > > NAME STATE READ WRITE CKSUM > raidpool ONLINE 0 0 0 > raidz ONLINE 0 0 0 > c2d0 ONLINE 0 0 0 > c3d0 ONLINE 0 0 0 > c4d0 ONLINE 0 0 0 > c5d0 ONLINE 0 0 0 > c6d0 ONLINE 0 0 0 > c6d1 ONLINE 0 0 0 > c7d0 ONLINE 0 0 0 > c7d1 ONLINE 0 0 0 > > errors: No known data errors > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Eric Schrock
2006-May-09 21:27 UTC
[zfs-discuss] Possible corruption after disk hiccups...
On Tue, May 09, 2006 at 12:55:34PM -0700, Alan Romeril wrote:> > I''ve set off a scrub to check things, there was no resilver of any > data on boot, but there''s mention of corruption... Is there any way > of translating this output to filenames? As this is a zfs root, I''d > like to be absolutely sure before doing too much with this machine.There''s an open RFE to display these as filenames: 6410433 ''zpool status -v'' would be more useful with filenames But it''s non-trivial. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock