Siegfried Nikolaivich
2007-Jan-10 16:26 UTC
[zfs-discuss] Saving scrub results before scrub completes
On 27-Dec-06, at 9:45 PM, George Wilson wrote:> Siegfried, > > Can you provide the panic string that you are seeing? We should be > able to pull out the persistent error log information from the > corefile. You can take a look at spa_get_errlog() function as a > starting point. >This is the panic string that I am seeing: Dec 26 18:55:51 FServe unix: [ID 836849 kern.notice] Dec 26 18:55:51 FServe ^Mpanic[cpu1]/thread=fffffe8000929c80: Dec 26 18:55:51 FServe genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fffffe8000929980 addr=ffffff00b3e621f0 Dec 26 18:55:51 FServe unix: [ID 100000 kern.notice] Dec 26 18:55:51 FServe unix: [ID 839527 kern.notice] sched: Dec 26 18:55:51 FServe unix: [ID 753105 kern.notice] #pf Page fault Dec 26 18:55:51 FServe unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xffffff00b3e621f0 Dec 26 18:55:51 FServe unix: [ID 243837 kern.notice] pid=0, pc=0xfffffffff3eaa2b0, sp=0xfffffe8000929a78, eflags=0x10282 Dec 26 18:55:51 FServe unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f0<xmme,fxsr,pge,mce,pae,pse> Dec 26 18:55:51 FServe unix: [ID 354241 kern.notice] cr2: ffffff00b3e621f0 cr3: a3ec000 cr8: c Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] rdi: fffffe80dd69ad40 rsi: ffffff00b3e62040 rdx: 0 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] rcx: ffffffff9c6bd6ce r8: 1 r9: ffffffff Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] rax: ffffff00b3e62208 rbx: ffffff00b3e62040 rbp: fffffe8000929ab0 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] r10: ffffffff982421c8 r11: 1 r12: ffffff00b3e62208 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] r13: ffffffff81204468 r14: 1c8 r15: fffffe80dd69ad40 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] fsb: ffffffff80000000 gsb: ffffffff80f1d000 ds: 43 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] es: 43 fs: 0 gs: 1c3 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] trp: e err: 0 rip: fffffffff3eaa2b0 Dec 26 18:55:51 FServe unix: [ID 592667 kern.notice] cs: 28 rfl: 10282 rsp: fffffe8000929a78 Dec 26 18:55:51 FServe unix: [ID 266532 kern.notice] ss: 30 Dec 26 18:55:51 FServe unix: [ID 100000 kern.notice] Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929890 unix:real_mode_end+6ad1 () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929970 unix:trap+d77 () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929980 unix:cmntrap+13f () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929ab0 zfs:vdev_queue_offset_compare+0 () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929ae0 genunix:avl_add+1f () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929b60 zfs:vdev_queue_io_to_issue+1ec () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929ba0 zfs:zfsctl_ops_root+33bc48b1 () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929bc0 zfs:vdev_disk_io_done+11 () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929bd0 zfs:vdev_io_done+12 () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929be0 zfs:zio_vdev_io_done+1b () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929c60 genunix:taskq_thread+bc () Dec 26 18:55:51 FServe genunix: [ID 655072 kern.notice] fffffe8000929c70 unix:thread_start+8 () Dec 26 18:55:51 FServe unix: [ID 100000 kern.notice] Dec 26 18:55:51 FServe genunix: [ID 672855 kern.notice] syncing file systems... Dec 26 18:55:51 FServe genunix: [ID 733762 kern.notice] 3 Dec 26 18:55:52 FServe genunix: [ID 904073 kern.notice] done Dec 26 18:55:53 FServe genunix: [ID 111219 kern.notice] dumping to / dev/dsk/c1d0s1, offset 1719074816, content: kernel Additionally, but perhaps not related, I came across this while looking at the logs: Dec 26 17:53:00 FServe marvell88sx: [ID 812950 kern.warning] WARNING: marvell88sx0: error on port 1: Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] SError interrupt Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] EDMA self disabled Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] command request queue parity error Dec 26 17:53:00 FServe marvell88sx: [ID 131198 kern.info] SErrors: Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] Recovered communication error Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] PHY ready change Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] 10-bit to 8-bit decode error Dec 26 17:53:00 FServe marvell88sx: [ID 517869 kern.info] Disparity error This happened right before a system hang. I have this other strange problem where if I send certain files over the network (CIFS or NFS), the machine slows to a crawl until it is "hung". This is reproducible every time with the same "special" files, but it does not happen locally, only over the network. I already posted about this in network-discuss and am currently investigating the issue.> Additionally, you can look at the corefile using mdb and take a > look at the vdev error stats. Here''s an example (hopefully the > formatting doesn''t get messed up): >Excellent information, thanks! It looks like there are no read/write/ chksum errors. I now at least have a way of checking the scrub results until the panic is fixed (hopefully someday). Siegfried> > ::spa -v > ADDR STATE NAME > 0000060004473680 ACTIVE test > > ADDR STATE AUX DESCRIPTION > 0000060004bcb500 HEALTHY - root > 0000060004bcafc0 HEALTHY - /dev/dsk/c0t2d0s0 > > > 0000060004bcb500::vdev -re > ADDR STATE AUX DESCRIPTION > 0000060004bcb500 HEALTHY - root > > READ WRITE FREE CLAIM > IOCTL > OPS 0 0 0 > 0 0 > BYTES 0 0 0 > 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > 0000060004bcafc0 HEALTHY - /dev/dsk/c0t2d0s0 > > READ WRITE FREE CLAIM > IOCTL > OPS 0x17 0x1d2 0 > 0 0 > BYTES 0x19c000 0x11da00 0 > 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > This will show you and read/write/cksum errors. > > Thanks, > George > > > Siegfried Nikolaivich wrote: >> Hello All, >> I am wondering if there is a way to save the scrub results right >> before the scrub is complete. >> After upgrading to Solaris 10U3 I still have ZFS panicing right as >> the scrub completes. The scrub results seem to be "cleared" when >> system boots back up, so I never get a chance to see them. >> Does anyone know of a simple way? >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
George Wilson
2007-Jan-10 16:26 UTC
[zfs-discuss] Saving scrub results before scrub completes
Siegfried, Can you provide the panic string that you are seeing? We should be able to pull out the persistent error log information from the corefile. You can take a look at spa_get_errlog() function as a starting point. Additionally, you can look at the corefile using mdb and take a look at the vdev error stats. Here''s an example (hopefully the formatting doesn''t get messed up): > ::spa -v ADDR STATE NAME 0000060004473680 ACTIVE test ADDR STATE AUX DESCRIPTION 0000060004bcb500 HEALTHY - root 0000060004bcafc0 HEALTHY - /dev/dsk/c0t2d0s0 > 0000060004bcb500::vdev -re ADDR STATE AUX DESCRIPTION 0000060004bcb500 HEALTHY - root READ WRITE FREE CLAIM IOCTL OPS 0 0 0 0 0 BYTES 0 0 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 0000060004bcafc0 HEALTHY - /dev/dsk/c0t2d0s0 READ WRITE FREE CLAIM IOCTL OPS 0x17 0x1d2 0 0 0 BYTES 0x19c000 0x11da00 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 This will show you and read/write/cksum errors. Thanks, George Siegfried Nikolaivich wrote:> Hello All, > > I am wondering if there is a way to save the scrub results right before the scrub is complete. > > After upgrading to Solaris 10U3 I still have ZFS panicing right as the scrub completes. The scrub results seem to be "cleared" when system boots back up, so I never get a chance to see them. > > Does anyone know of a simple way? > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Siegfried Nikolaivich
2007-Jan-10 16:26 UTC
[zfs-discuss] Saving scrub results before scrub completes
Hello All, I am wondering if there is a way to save the scrub results right before the scrub is complete. After upgrading to Solaris 10U3 I still have ZFS panicing right as the scrub completes. The scrub results seem to be "cleared" when system boots back up, so I never get a chance to see them. Does anyone know of a simple way? This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss