b28 on a Dell PowerEdge 2600. Loading modules: [ unix krtld genunix specfs dtrace pcplusmp ufs ip sctp usba uhci fcp fctl lofs md fcip cpc random crypto zfs nca logindmux ptm sppp nfs ipc ]> ::statusdebugging crash dump vmcore.0 (32-bit) from zfstest operating system: 5.11 snv_28 (i86pc) panic message: BAD TRAP: type=e (#pf Page fault) rp=c9d03aa4 addr=ea78885c dump content: kernel pages only> *panic_thread::findstack -vstack pointer for thread c9d03de0: c9d038e4 c9d039bc 0xcf48e5c8() c9d03aa4 0xea78885c() c9d03b14 buf_hash_insert+0x7b(e2568740, c9d03b44) c9d03b48 arc_write_done+0x7f(eb85c600) c9d03c7c zio_done+0x1ce(eb85c600) c9d03c9c zio_next_stage+0x73(eb85c600) c9d03cbc zio_wait_for_children+0x58() c9d03cdc zio_wait_children_done+0x18(eb85c600) c9d03cf8 zio_next_stage+0x73(eb85c600) c9d03d2c zio_vdev_io_assess+0xc6(eb85c600) c9d03d40 zio_next_stage+0x73(eb85c600) c9d03d54 vdev_disk_io_done+0x2b() c9d03d64 vdev_io_done+0x18() c9d03d78 zio_vdev_io_done+0xe() c9d03dc8 taskq_thread+0x16c(c7f83a18, 0) c9d03dd8 thread_start+8() Loading modules: [ unix krtld genunix specfs dtrace pcplusmp ufs ip sctp usba uhci fcp fctl nca lofs zfs random nfs md fcip cpc crypto logindmux ptm sppp ]> ::statusdebugging crash dump vmcore.1 (32-bit) from zfstest operating system: 5.11 snv_28 (i86pc) panic message: BAD TRAP: type=e (#pf Page fault) rp=c88eaaa4 addr=6bd occurred in module "zfs" due to a NULL pointer dereference dump content: kernel pages only> *panic_thread::findstack -vstack pointer for thread c88eade0: c88ea980 c88ea99c panic+0x12(fe879608, e, fe8797cc, fe8796b8, c88eaaa4, 6bd) c88ea9fc die+0x98(e, c88eaaa4, 6bd, 1) c88eaa90 trap+0x1169(c88eaaa4, 6bd, 1) c88eaaa4 _cmntrap+0x9b() c88eab14 buf_hash_insert+0x7b(d89570b0, c88eab44) c88eab48 arc_write_done+0x7f(d97b0a00) c88eac7c zio_done+0x1ce(d97b0a00) c88eac9c zio_next_stage+0x73(d97b0a00) c88eacbc zio_wait_for_children+0x58() c88eacdc zio_wait_children_done+0x18(d97b0a00) c88eacf8 zio_next_stage+0x73(d97b0a00) c88ead2c zio_vdev_io_assess+0xc6(d97b0a00) c88ead40 zio_next_stage+0x73(d97b0a00) c88ead54 vdev_disk_io_done+0x2b() c88ead64 vdev_io_done+0x18() c88ead78 zio_vdev_io_done+0xe() c88eadc8 taskq_thread+0x16c(c9322660, 0) c88eadd8 thread_start+8() both times were during heavy ZFS activity, the second time I was also modifying ZFS pools at the same time. are these crash dumps useful? if so I will upload them. (I was going to update this system to build 31 today, but liveupgrade isn''t playing nice, so I''ll just reinstall). This message posted from opensolaris.org
Hmm, the stack trace looks a bit like 6341326 to me, but I''m not an expert here... Any developer types know for sure ? (btw. now that ZFS is in the open, we''ve really got to stop putting "See comments" in the bug reports[1] - a point EricS made a while back) cheers, tim [1] because all people without sunsolve access can see is http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6341326 which doesn''t tell them anything useful. No need for smoke & mirrors anymore :-) On Wed, 2006-02-01 at 02:14 -0800, grant beattie wrote:> b28 on a Dell PowerEdge 2600. > > Loading modules: [ unix krtld genunix specfs dtrace pcplusmp ufs ip sctp usba uhci fcp fctl lofs md fcip cpc random crypto zfs nca logindmux ptm sppp nfs ipc ] > > ::status > debugging crash dump vmcore.0 (32-bit) from zfstest > operating system: 5.11 snv_28 (i86pc) > panic message: BAD TRAP: type=e (#pf Page fault) rp=c9d03aa4 addr=ea78885c > dump content: kernel pages only > > > *panic_thread::findstack -v > stack pointer for thread c9d03de0: c9d038e4 > c9d039bc 0xcf48e5c8() > c9d03aa4 0xea78885c() > c9d03b14 buf_hash_insert+0x7b(e2568740, c9d03b44) > c9d03b48 arc_write_done+0x7f(eb85c600) > c9d03c7c zio_done+0x1ce(eb85c600) > c9d03c9c zio_next_stage+0x73(eb85c600) > c9d03cbc zio_wait_for_children+0x58() > c9d03cdc zio_wait_children_done+0x18(eb85c600) > c9d03cf8 zio_next_stage+0x73(eb85c600) > c9d03d2c zio_vdev_io_assess+0xc6(eb85c600) > c9d03d40 zio_next_stage+0x73(eb85c600) > c9d03d54 vdev_disk_io_done+0x2b() > c9d03d64 vdev_io_done+0x18() > c9d03d78 zio_vdev_io_done+0xe() > c9d03dc8 taskq_thread+0x16c(c7f83a18, 0) > c9d03dd8 thread_start+8() > > > Loading modules: [ unix krtld genunix specfs dtrace pcplusmp ufs ip sctp usba uhci fcp fctl nca lofs zfs random nfs md fcip cpc crypto logindmux ptm sppp ] > > ::status > debugging crash dump vmcore.1 (32-bit) from zfstest > operating system: 5.11 snv_28 (i86pc) > panic message: > BAD TRAP: type=e (#pf Page fault) rp=c88eaaa4 addr=6bd occurred in module "zfs" > due to a NULL pointer dereference > dump content: kernel pages only > > *panic_thread::findstack -v > stack pointer for thread c88eade0: c88ea980 > c88ea99c panic+0x12(fe879608, e, fe8797cc, fe8796b8, c88eaaa4, 6bd) > c88ea9fc die+0x98(e, c88eaaa4, 6bd, 1) > c88eaa90 trap+0x1169(c88eaaa4, 6bd, 1) > c88eaaa4 _cmntrap+0x9b() > c88eab14 buf_hash_insert+0x7b(d89570b0, c88eab44) > c88eab48 arc_write_done+0x7f(d97b0a00) > c88eac7c zio_done+0x1ce(d97b0a00) > c88eac9c zio_next_stage+0x73(d97b0a00) > c88eacbc zio_wait_for_children+0x58() > c88eacdc zio_wait_children_done+0x18(d97b0a00) > c88eacf8 zio_next_stage+0x73(d97b0a00) > c88ead2c zio_vdev_io_assess+0xc6(d97b0a00) > c88ead40 zio_next_stage+0x73(d97b0a00) > c88ead54 vdev_disk_io_done+0x2b() > c88ead64 vdev_io_done+0x18() > c88ead78 zio_vdev_io_done+0xe() > c88eadc8 taskq_thread+0x16c(c9322660, 0) > c88eadd8 thread_start+8() > > > both times were during heavy ZFS activity, the second time I was also modifying ZFS pools at the same time. > > are these crash dumps useful? if so I will upload them. (I was going to update this system to build 31 today, but liveupgrade isn''t playing nice, so I''ll just reinstall). > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Tim Foster, Sun Microsystems Inc, Operating Platforms Group Engineering Operations http://blogs.sun.com/timf
On Wed, Feb 01, 2006 at 10:56:26AM +0000, Tim Foster wrote:> Hmm, the stack trace looks a bit like 6341326 to me, but I''m not an > expert here... Any developer types know for sure ? > > (btw. now that ZFS is in the open, we''ve really got to stop putting "See > comments" in the bug reports[1] - a point EricS made a while back) > > cheers, > tim > > > [1] because all people without sunsolve access can see is > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6341326 which > doesn''t tell them anything useful. No need for smoke & mirrors > anymore :-)yeah, I searched the opensolaris bugs database, and I do have sunsolve access but I couldn''t find anything related in there, either -- that bug id doesn''t turn up any results for me. grant.
This looks to me like the ARC caching on I/O failure problem, but maybe Mark can comment? We have this fixed in a project gate due to go back around build 35. Can you run "::spa -ve" on the dumps and send the output? - Eric On Wed, Feb 01, 2006 at 02:14:24AM -0800, grant beattie wrote:> b28 on a Dell PowerEdge 2600. > > Loading modules: [ unix krtld genunix specfs dtrace pcplusmp ufs ip sctp usba uhci fcp fctl lofs md fcip cpc random crypto zfs nca logindmux ptm sppp nfs ipc ] > > ::status > debugging crash dump vmcore.0 (32-bit) from zfstest > operating system: 5.11 snv_28 (i86pc) > panic message: BAD TRAP: type=e (#pf Page fault) rp=c9d03aa4 addr=ea78885c > dump content: kernel pages only > > > *panic_thread::findstack -v > stack pointer for thread c9d03de0: c9d038e4 > c9d039bc 0xcf48e5c8() > c9d03aa4 0xea78885c() > c9d03b14 buf_hash_insert+0x7b(e2568740, c9d03b44) > c9d03b48 arc_write_done+0x7f(eb85c600) > c9d03c7c zio_done+0x1ce(eb85c600) > c9d03c9c zio_next_stage+0x73(eb85c600) > c9d03cbc zio_wait_for_children+0x58() > c9d03cdc zio_wait_children_done+0x18(eb85c600) > c9d03cf8 zio_next_stage+0x73(eb85c600) > c9d03d2c zio_vdev_io_assess+0xc6(eb85c600) > c9d03d40 zio_next_stage+0x73(eb85c600) > c9d03d54 vdev_disk_io_done+0x2b() > c9d03d64 vdev_io_done+0x18() > c9d03d78 zio_vdev_io_done+0xe() > c9d03dc8 taskq_thread+0x16c(c7f83a18, 0) > c9d03dd8 thread_start+8() > > > Loading modules: [ unix krtld genunix specfs dtrace pcplusmp ufs ip sctp usba uhci fcp fctl nca lofs zfs random nfs md fcip cpc crypto logindmux ptm sppp ] > > ::status > debugging crash dump vmcore.1 (32-bit) from zfstest > operating system: 5.11 snv_28 (i86pc) > panic message: > BAD TRAP: type=e (#pf Page fault) rp=c88eaaa4 addr=6bd occurred in module "zfs" > due to a NULL pointer dereference > dump content: kernel pages only > > *panic_thread::findstack -v > stack pointer for thread c88eade0: c88ea980 > c88ea99c panic+0x12(fe879608, e, fe8797cc, fe8796b8, c88eaaa4, 6bd) > c88ea9fc die+0x98(e, c88eaaa4, 6bd, 1) > c88eaa90 trap+0x1169(c88eaaa4, 6bd, 1) > c88eaaa4 _cmntrap+0x9b() > c88eab14 buf_hash_insert+0x7b(d89570b0, c88eab44) > c88eab48 arc_write_done+0x7f(d97b0a00) > c88eac7c zio_done+0x1ce(d97b0a00) > c88eac9c zio_next_stage+0x73(d97b0a00) > c88eacbc zio_wait_for_children+0x58() > c88eacdc zio_wait_children_done+0x18(d97b0a00) > c88eacf8 zio_next_stage+0x73(d97b0a00) > c88ead2c zio_vdev_io_assess+0xc6(d97b0a00) > c88ead40 zio_next_stage+0x73(d97b0a00) > c88ead54 vdev_disk_io_done+0x2b() > c88ead64 vdev_io_done+0x18() > c88ead78 zio_vdev_io_done+0xe() > c88eadc8 taskq_thread+0x16c(c9322660, 0) > c88eadd8 thread_start+8() > > > both times were during heavy ZFS activity, the second time I was also modifying ZFS pools at the same time. > > are these crash dumps useful? if so I will upload them. (I was going to update this system to build 31 today, but liveupgrade isn''t playing nice, so I''ll just reinstall). > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Wed, Feb 01, 2006 at 10:03:01AM -0800, Eric Schrock wrote:> This looks to me like the ARC caching on I/O failure problem, but maybe > Mark can comment? We have this fixed in a project gate due to go back > around build 35. Can you run "::spa -ve" on the dumps and send the > output?sure, here they are.. vmcore.0:> ::spa -veADDR STATE NAME cc273c40 ACTIVE export ADDR STATE AUX DESCRIPTION caaa8940 HEALTHY - root READ WRITE FREE CLAIM IOCTL OPS 0 0 0 0 0 BYTES 0 0 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 c6e33dc0 HEALTHY - /dev/dsk/c1t0d0s7 READ WRITE FREE CLAIM IOCTL OPS 0x2d38 0x6d0e 0 0 0 BYTES 0x2de48200 0x23adf600 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ebaa9340 ACTIVE mailtank c6d7eb80 HEALTHY - root READ WRITE FREE CLAIM IOCTL OPS 0 0 0 0 0 BYTES 0 0 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 c6b9cb80 HEALTHY - /dev/dsk/c0t0d0 READ WRITE FREE CLAIM IOCTL OPS 0x101 0x11b8c 0 0 0 BYTES 0xf89000 0x1f50b7e00 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 e9971200 HEALTHY - /dev/dsk/c0t1d0 READ WRITE FREE CLAIM IOCTL OPS 0xc6 0x11781 0 0 0 BYTES 0xc48000 0x1f4b89400 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 c6d7fdc0 HEALTHY - /dev/dsk/c0t2d0 READ WRITE FREE CLAIM IOCTL OPS 0xaf 0x10ead 0 0 0 BYTES 0xa96000 0x1f4586c00 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 eca44000 HEALTHY - /dev/dsk/c0t3d0 READ WRITE FREE CLAIM IOCTL OPS 0xe5 0x1163a 0 0 0 BYTES 0xde4000 0x1f4b2d800 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 c9bdae00 HEALTHY - /dev/dsk/c0t4d0 READ WRITE FREE CLAIM IOCTL OPS 0x15d 0x1260f 0 0 0 BYTES 0x150d000 0x1f572fa00 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 d30e7200 HEALTHY - /dev/dsk/c0t5d0 READ WRITE FREE CLAIM IOCTL OPS 0x205 0x12d74 0 0 0 BYTES 0x1f54000 0x1f5b39400 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 c6c77dc0 HEALTHY - /dev/dsk/c1t1d0 READ WRITE FREE CLAIM IOCTL OPS 0xb6 0x1202d 0 0 0 BYTES 0xb24000 0x212b41600 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 vmcore.1:> ::spa -veADDR STATE NAME ccd10940 ACTIVE export ADDR STATE AUX DESCRIPTION c6b9f480 HEALTHY - root READ WRITE FREE CLAIM IOCTL OPS 0 0 0 0 0 BYTES 0 0 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 c6ba0680 HEALTHY - /dev/dsk/c1t0d0s7 READ WRITE FREE CLAIM IOCTL OPS 0x2016 0xa54 0 0 0 BYTES 0x19f56c00 0x1530200 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 c85163c0 ACTIVE mailtank c9af9b00 HEALTHY - root READ WRITE FREE CLAIM IOCTL OPS 0 0 0 0 0 BYTES 0 0 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 c6da66c0 HEALTHY - /dev/dsk/c0t4d0 READ WRITE FREE CLAIM IOCTL OPS 0x35bb7 0x1aa3e 0 0 0 BYTES 0x6acd3b000 0x3243ee800 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 c6da1480 HEALTHY - /dev/dsk/c0t3d0 READ WRITE FREE CLAIM IOCTL OPS 0x11a82 0x1842b 0 0 0 BYTES 0x22fcd2e00 0x2bf8d5a00 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0x3a c6d9bd80 HEALTHY - /dev/dsk/c0t0d0 READ WRITE FREE CLAIM IOCTL OPS 0x391d 0x2d3d 0 0 0 BYTES 0x72138000 0x551ac200 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 c6b9d940 HEALTHY - /dev/dsk/c0t1d0 READ WRITE FREE CLAIM IOCTL OPS 0x390d 0x2ce2 0 0 0 BYTES 0x71f28800 0x55195400 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0
> Hmm, the stack trace looks a bit like 6341326 to me, but I''m not an > expert here... Any developer types know for sure ?Yep, that''s the one. Fix coming in build 35. Sorry about that, Jeff
The interesting thing here is that you''ve had dozens of checksum errors on one disk, /dev/dsk/c0t3d0. I''d suggest replacing it as soon as you can. FMI, were there any errors reported in /var/adm/messages? Jeff
On Wed, Feb 01, 2006 at 03:36:37PM -0800, Jeff Bonwick wrote:> The interesting thing here is that you''ve had dozens of checksum errors > on one disk, /dev/dsk/c0t3d0. I''d suggest replacing it as soon as you can. > FMI, were there any errors reported in /var/adm/messages?there were indeed a whole bunch of read errors logged for c0t2d0, but I don''t see any for c0t3d0, so it seems there is at least one disk having problems. I did also hit the bad checksum panic a few times, which seems to have been reported by others, too. glad to know a fix for the previous panics will make its way in :) thanks Jeff, grant.
On Wed, Feb 01, 2006 at 03:32:18PM -0800, Jeff Bonwick wrote:> > Hmm, the stack trace looks a bit like 6341326 to me, but I''m not an > > expert here... Any developer types know for sure ? > > Yep, that''s the one. Fix coming in build 35.hey Jeff, I didn''t see this bug in the 20060228 changelog.. is this because 20060228 isn''t quite build 35, or did it get delayed for some reason? I''m eagerly awaiting this fix so my box will stop panic''ing all the time :) grant.