thr3ads.net - zfs discuss - [zfs-discuss] two ZFS panics [Feb 2006]

If this information is useful, please help other people find it:
Share via:

grant beattie

2006-Feb-01 10:14 UTC

[zfs-discuss] two ZFS panics

b28 on a Dell PowerEdge 2600.

Loading modules: [ unix krtld genunix specfs dtrace pcplusmp ufs ip sctp usba
uhci fcp fctl lofs md fcip cpc random crypto zfs nca logindmux ptm sppp nfs ipc
]> ::statusdebugging crash dump vmcore.0 (32-bit) from zfstest
operating system: 5.11 snv_28 (i86pc)
panic message: BAD TRAP: type=e (#pf Page fault) rp=c9d03aa4 addr=ea78885c
dump content: kernel pages only
> *panic_thread::findstack -vstack pointer for thread c9d03de0: c9d038e4
  c9d039bc 0xcf48e5c8()
  c9d03aa4 0xea78885c()
  c9d03b14 buf_hash_insert+0x7b(e2568740, c9d03b44)
  c9d03b48 arc_write_done+0x7f(eb85c600)
  c9d03c7c zio_done+0x1ce(eb85c600)
  c9d03c9c zio_next_stage+0x73(eb85c600)
  c9d03cbc zio_wait_for_children+0x58()
  c9d03cdc zio_wait_children_done+0x18(eb85c600)
  c9d03cf8 zio_next_stage+0x73(eb85c600)
  c9d03d2c zio_vdev_io_assess+0xc6(eb85c600)
  c9d03d40 zio_next_stage+0x73(eb85c600)
  c9d03d54 vdev_disk_io_done+0x2b()
  c9d03d64 vdev_io_done+0x18()
  c9d03d78 zio_vdev_io_done+0xe()
  c9d03dc8 taskq_thread+0x16c(c7f83a18, 0)
  c9d03dd8 thread_start+8()


Loading modules: [ unix krtld genunix specfs dtrace pcplusmp ufs ip sctp usba
uhci fcp fctl nca lofs zfs random nfs md fcip cpc crypto logindmux ptm sppp
]> ::statusdebugging crash dump vmcore.1 (32-bit) from zfstest
operating system: 5.11 snv_28 (i86pc)
panic message:
BAD TRAP: type=e (#pf Page fault) rp=c88eaaa4 addr=6bd occurred in module
"zfs"
due to a NULL pointer dereference
dump content: kernel pages only> *panic_thread::findstack -vstack pointer for thread c88eade0: c88ea980
  c88ea99c panic+0x12(fe879608, e, fe8797cc, fe8796b8, c88eaaa4, 6bd)
  c88ea9fc die+0x98(e, c88eaaa4, 6bd, 1)
  c88eaa90 trap+0x1169(c88eaaa4, 6bd, 1)
  c88eaaa4 _cmntrap+0x9b()
  c88eab14 buf_hash_insert+0x7b(d89570b0, c88eab44)
  c88eab48 arc_write_done+0x7f(d97b0a00)
  c88eac7c zio_done+0x1ce(d97b0a00)
  c88eac9c zio_next_stage+0x73(d97b0a00)
  c88eacbc zio_wait_for_children+0x58()
  c88eacdc zio_wait_children_done+0x18(d97b0a00)
  c88eacf8 zio_next_stage+0x73(d97b0a00)
  c88ead2c zio_vdev_io_assess+0xc6(d97b0a00)
  c88ead40 zio_next_stage+0x73(d97b0a00)
  c88ead54 vdev_disk_io_done+0x2b()
  c88ead64 vdev_io_done+0x18()
  c88ead78 zio_vdev_io_done+0xe()
  c88eadc8 taskq_thread+0x16c(c9322660, 0)
  c88eadd8 thread_start+8()


both times were during heavy ZFS activity, the second time I was also modifying
ZFS pools at the same time.

are these crash dumps useful? if so I will upload them. (I was going to update
this system to build 31 today, but liveupgrade isn''t playing nice, so
I''ll just reinstall).
This message posted from opensolaris.org

Tim Foster

2006-Feb-01 10:56 UTC

head link

[zfs-discuss] two ZFS panics

Hmm, the stack trace looks a bit like 6341326 to me, but I''m not an
expert here... Any developer types know for sure ?

(btw. now that ZFS is in the open, we''ve really got to stop putting
"See
comments" in the bug reports[1] - a point EricS made a while back)

	cheers,
			tim


[1] because all people without sunsolve access can see is
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6341326 which
doesn''t tell them anything useful. No need for smoke & mirrors
anymore :-)

On Wed, 2006-02-01 at 02:14 -0800, grant beattie wrote:> b28 on a Dell PowerEdge 2600.
> 
> Loading modules: [ unix krtld genunix specfs dtrace pcplusmp ufs ip sctp
usba uhci fcp fctl lofs md fcip cpc random crypto zfs nca logindmux ptm sppp nfs
ipc ]
> > ::status
> debugging crash dump vmcore.0 (32-bit) from zfstest
> operating system: 5.11 snv_28 (i86pc)
> panic message: BAD TRAP: type=e (#pf Page fault) rp=c9d03aa4 addr=ea78885c
> dump content: kernel pages only
> 
> > *panic_thread::findstack -v
> stack pointer for thread c9d03de0: c9d038e4
>   c9d039bc 0xcf48e5c8()
>   c9d03aa4 0xea78885c()
>   c9d03b14 buf_hash_insert+0x7b(e2568740, c9d03b44)
>   c9d03b48 arc_write_done+0x7f(eb85c600)
>   c9d03c7c zio_done+0x1ce(eb85c600)
>   c9d03c9c zio_next_stage+0x73(eb85c600)
>   c9d03cbc zio_wait_for_children+0x58()
>   c9d03cdc zio_wait_children_done+0x18(eb85c600)
>   c9d03cf8 zio_next_stage+0x73(eb85c600)
>   c9d03d2c zio_vdev_io_assess+0xc6(eb85c600)
>   c9d03d40 zio_next_stage+0x73(eb85c600)
>   c9d03d54 vdev_disk_io_done+0x2b()
>   c9d03d64 vdev_io_done+0x18()
>   c9d03d78 zio_vdev_io_done+0xe()
>   c9d03dc8 taskq_thread+0x16c(c7f83a18, 0)
>   c9d03dd8 thread_start+8()
> 
> 
> Loading modules: [ unix krtld genunix specfs dtrace pcplusmp ufs ip sctp
usba uhci fcp fctl nca lofs zfs random nfs md fcip cpc crypto logindmux ptm sppp
]
> > ::status
> debugging crash dump vmcore.1 (32-bit) from zfstest
> operating system: 5.11 snv_28 (i86pc)
> panic message:
> BAD TRAP: type=e (#pf Page fault) rp=c88eaaa4 addr=6bd occurred in module
"zfs"
> due to a NULL pointer dereference
> dump content: kernel pages only
> > *panic_thread::findstack -v
> stack pointer for thread c88eade0: c88ea980
>   c88ea99c panic+0x12(fe879608, e, fe8797cc, fe8796b8, c88eaaa4, 6bd)
>   c88ea9fc die+0x98(e, c88eaaa4, 6bd, 1)
>   c88eaa90 trap+0x1169(c88eaaa4, 6bd, 1)
>   c88eaaa4 _cmntrap+0x9b()
>   c88eab14 buf_hash_insert+0x7b(d89570b0, c88eab44)
>   c88eab48 arc_write_done+0x7f(d97b0a00)
>   c88eac7c zio_done+0x1ce(d97b0a00)
>   c88eac9c zio_next_stage+0x73(d97b0a00)
>   c88eacbc zio_wait_for_children+0x58()
>   c88eacdc zio_wait_children_done+0x18(d97b0a00)
>   c88eacf8 zio_next_stage+0x73(d97b0a00)
>   c88ead2c zio_vdev_io_assess+0xc6(d97b0a00)
>   c88ead40 zio_next_stage+0x73(d97b0a00)
>   c88ead54 vdev_disk_io_done+0x2b()
>   c88ead64 vdev_io_done+0x18()
>   c88ead78 zio_vdev_io_done+0xe()
>   c88eadc8 taskq_thread+0x16c(c9322660, 0)
>   c88eadd8 thread_start+8()
> 
> 
> both times were during heavy ZFS activity, the second time I was also
modifying ZFS pools at the same time.
> 
> are these crash dumps useful? if so I will upload them. (I was going to
update this system to build 31 today, but liveupgrade isn''t playing
nice, so I''ll just reinstall).
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- 
Tim Foster, Sun Microsystems Inc, Operating Platforms Group
Engineering Operations            http://blogs.sun.com/timf

grant beattie

2006-Feb-01 11:36 UTC

head link

[zfs-discuss] two ZFS panics

On Wed, Feb 01, 2006 at 10:56:26AM +0000, Tim Foster wrote:
> Hmm, the stack trace looks a bit like 6341326 to me, but I''m not
an
> expert here... Any developer types know for sure ?
> 
> (btw. now that ZFS is in the open, we''ve really got to stop
putting "See
> comments" in the bug reports[1] - a point EricS made a while back)
> 
> 	cheers,
> 			tim
> 
> 
> [1] because all people without sunsolve access can see is
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6341326 which
> doesn''t tell them anything useful. No need for smoke & mirrors
> anymore :-)
yeah, I searched the opensolaris bugs database, and I do have sunsolve
access but I couldn''t find anything related in there, either -- that
bug id doesn''t turn up any results for me.

grant.

Eric Schrock

2006-Feb-01 18:03 UTC

head link

[zfs-discuss] two ZFS panics

This looks to me like the ARC caching on I/O failure problem, but maybe
Mark can comment?  We have this fixed in a project gate due to go back
around build 35.  Can you run "::spa -ve" on the dumps and send the
output?

- Eric

On Wed, Feb 01, 2006 at 02:14:24AM -0800, grant beattie
wrote:> b28 on a Dell PowerEdge 2600.
> 
> Loading modules: [ unix krtld genunix specfs dtrace pcplusmp ufs ip sctp
usba uhci fcp fctl lofs md fcip cpc random crypto zfs nca logindmux ptm sppp nfs
ipc ]
> > ::status
> debugging crash dump vmcore.0 (32-bit) from zfstest
> operating system: 5.11 snv_28 (i86pc)
> panic message: BAD TRAP: type=e (#pf Page fault) rp=c9d03aa4 addr=ea78885c
> dump content: kernel pages only
> 
> > *panic_thread::findstack -v
> stack pointer for thread c9d03de0: c9d038e4
>   c9d039bc 0xcf48e5c8()
>   c9d03aa4 0xea78885c()
>   c9d03b14 buf_hash_insert+0x7b(e2568740, c9d03b44)
>   c9d03b48 arc_write_done+0x7f(eb85c600)
>   c9d03c7c zio_done+0x1ce(eb85c600)
>   c9d03c9c zio_next_stage+0x73(eb85c600)
>   c9d03cbc zio_wait_for_children+0x58()
>   c9d03cdc zio_wait_children_done+0x18(eb85c600)
>   c9d03cf8 zio_next_stage+0x73(eb85c600)
>   c9d03d2c zio_vdev_io_assess+0xc6(eb85c600)
>   c9d03d40 zio_next_stage+0x73(eb85c600)
>   c9d03d54 vdev_disk_io_done+0x2b()
>   c9d03d64 vdev_io_done+0x18()
>   c9d03d78 zio_vdev_io_done+0xe()
>   c9d03dc8 taskq_thread+0x16c(c7f83a18, 0)
>   c9d03dd8 thread_start+8()
> 
> 
> Loading modules: [ unix krtld genunix specfs dtrace pcplusmp ufs ip sctp
usba uhci fcp fctl nca lofs zfs random nfs md fcip cpc crypto logindmux ptm sppp
]
> > ::status
> debugging crash dump vmcore.1 (32-bit) from zfstest
> operating system: 5.11 snv_28 (i86pc)
> panic message:
> BAD TRAP: type=e (#pf Page fault) rp=c88eaaa4 addr=6bd occurred in module
"zfs"
> due to a NULL pointer dereference
> dump content: kernel pages only
> > *panic_thread::findstack -v
> stack pointer for thread c88eade0: c88ea980
>   c88ea99c panic+0x12(fe879608, e, fe8797cc, fe8796b8, c88eaaa4, 6bd)
>   c88ea9fc die+0x98(e, c88eaaa4, 6bd, 1)
>   c88eaa90 trap+0x1169(c88eaaa4, 6bd, 1)
>   c88eaaa4 _cmntrap+0x9b()
>   c88eab14 buf_hash_insert+0x7b(d89570b0, c88eab44)
>   c88eab48 arc_write_done+0x7f(d97b0a00)
>   c88eac7c zio_done+0x1ce(d97b0a00)
>   c88eac9c zio_next_stage+0x73(d97b0a00)
>   c88eacbc zio_wait_for_children+0x58()
>   c88eacdc zio_wait_children_done+0x18(d97b0a00)
>   c88eacf8 zio_next_stage+0x73(d97b0a00)
>   c88ead2c zio_vdev_io_assess+0xc6(d97b0a00)
>   c88ead40 zio_next_stage+0x73(d97b0a00)
>   c88ead54 vdev_disk_io_done+0x2b()
>   c88ead64 vdev_io_done+0x18()
>   c88ead78 zio_vdev_io_done+0xe()
>   c88eadc8 taskq_thread+0x16c(c9322660, 0)
>   c88eadd8 thread_start+8()
> 
> 
> both times were during heavy ZFS activity, the second time I was also
modifying ZFS pools at the same time.
> 
> are these crash dumps useful? if so I will upload them. (I was going to
update this system to build 31 today, but liveupgrade isn''t playing
nice, so I''ll just reinstall).
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

grant beattie

2006-Feb-01 22:22 UTC

head link

[zfs-discuss] two ZFS panics

On Wed, Feb 01, 2006 at 10:03:01AM -0800, Eric Schrock wrote:
> This looks to me like the ARC caching on I/O failure problem, but maybe
> Mark can comment?  We have this fixed in a project gate due to go back
> around build 35.  Can you run "::spa -ve" on the dumps and send
the
> output?
sure, here they are..

vmcore.0:
> ::spa -ve
ADDR         STATE NAME                                                        
cc273c40    ACTIVE export

    ADDR     STATE     AUX          DESCRIPTION                                
    caaa8940 HEALTHY   -            root
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS               0            0            0            0            0
        BYTES             0            0            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    c6e33dc0 HEALTHY   -              /dev/dsk/c1t0d0s7
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS          0x2d38       0x6d0e            0            0            0
        BYTES    0x2de48200   0x23adf600            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
ebaa9340    ACTIVE mailtank

    c6d7eb80 HEALTHY   -            root
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS               0            0            0            0            0
        BYTES             0            0            0            0            0
        EREAD             0
        EWRITE            0           
        ECKSUM            0
    
    c6b9cb80 HEALTHY   -              /dev/dsk/c0t0d0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS           0x101      0x11b8c            0            0            0
        BYTES      0xf89000  0x1f50b7e00            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    e9971200 HEALTHY   -              /dev/dsk/c0t1d0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0xc6      0x11781            0            0            0
        BYTES      0xc48000  0x1f4b89400            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    c6d7fdc0 HEALTHY   -              /dev/dsk/c0t2d0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0xaf      0x10ead            0            0            0
        BYTES      0xa96000  0x1f4586c00            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    eca44000 HEALTHY   -              /dev/dsk/c0t3d0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0xe5      0x1163a            0            0            0
        BYTES      0xde4000  0x1f4b2d800            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    c9bdae00 HEALTHY   -              /dev/dsk/c0t4d0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS           0x15d      0x1260f            0            0            0
        BYTES     0x150d000  0x1f572fa00            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    d30e7200 HEALTHY   -              /dev/dsk/c0t5d0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS           0x205      0x12d74            0            0            0
        BYTES     0x1f54000  0x1f5b39400            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    c6c77dc0 HEALTHY   -              /dev/dsk/c1t1d0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0xb6      0x1202d            0            0            0
        BYTES      0xb24000  0x212b41600            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0


vmcore.1:
> ::spa -ve
ADDR         STATE NAME                                                        
ccd10940    ACTIVE export

    ADDR     STATE     AUX          DESCRIPTION                                
    c6b9f480 HEALTHY   -            root
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS               0            0            0            0            0
        BYTES             0            0            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    c6ba0680 HEALTHY   -              /dev/dsk/c1t0d0s7
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS          0x2016        0xa54            0            0            0
        BYTES    0x19f56c00    0x1530200            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
c85163c0    ACTIVE mailtank

    c9af9b00 HEALTHY   -            root
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS               0            0            0            0            0
        BYTES             0            0            0            0            0
        EREAD             0
        EWRITE            0           
        ECKSUM            0
    
    c6da66c0 HEALTHY   -              /dev/dsk/c0t4d0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS         0x35bb7      0x1aa3e            0            0            0
        BYTES   0x6acd3b000  0x3243ee800            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    c6da1480 HEALTHY   -              /dev/dsk/c0t3d0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS         0x11a82      0x1842b            0            0            0
        BYTES   0x22fcd2e00  0x2bf8d5a00            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM         0x3a
    
    c6d9bd80 HEALTHY   -              /dev/dsk/c0t0d0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS          0x391d       0x2d3d            0            0            0
        BYTES    0x72138000   0x551ac200            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    c6b9d940 HEALTHY   -              /dev/dsk/c0t1d0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS          0x390d       0x2ce2            0            0            0
        BYTES    0x71f28800   0x55195400            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0

Jeff Bonwick

2006-Feb-01 23:32 UTC

head link

[zfs-discuss] two ZFS panics

> Hmm, the stack trace looks a bit like 6341326 to me, but I''m not
an
> expert here... Any developer types know for sure ?
Yep, that''s the one.  Fix coming in build 35.

Sorry about that,

Jeff

Jeff Bonwick

2006-Feb-01 23:36 UTC

head link

[zfs-discuss] two ZFS panics

The interesting thing here is that you''ve had dozens of checksum errors
on one disk, /dev/dsk/c0t3d0.  I''d suggest replacing it as soon as you
can.
FMI, were there any errors reported in /var/adm/messages?

Jeff

grant beattie

2006-Feb-02 00:13 UTC

head link

[zfs-discuss] two ZFS panics

On Wed, Feb 01, 2006 at 03:36:37PM -0800, Jeff Bonwick wrote:
> The interesting thing here is that you''ve had dozens of checksum
errors
> on one disk, /dev/dsk/c0t3d0.  I''d suggest replacing it as soon as
you can.
> FMI, were there any errors reported in /var/adm/messages?
there were indeed a whole bunch of read errors logged for c0t2d0, but
I don''t see any for c0t3d0, so it seems there is at least one disk
having problems.

I did also hit the bad checksum panic a few times, which seems to
have been reported by others, too.

glad to know a fix for the previous panics will make its way in :)

thanks Jeff,
grant.

grant beattie

2006-Mar-05 04:47 UTC

head link

[zfs-discuss] two ZFS panics

On Wed, Feb 01, 2006 at 03:32:18PM -0800, Jeff Bonwick wrote:
> > Hmm, the stack trace looks a bit like 6341326 to me, but I''m
not an
> > expert here... Any developer types know for sure ?
> 
> Yep, that''s the one.  Fix coming in build 35.
hey Jeff,

I didn''t see this bug in the 20060228 changelog.. is this because
20060228 isn''t quite build 35, or did it get delayed for some reason?

I''m eagerly awaiting this fix so my box will stop panic''ing
all the
time :)

grant.

zfs discuss - Feb 2006 - two ZFS panics

[zfs-discuss] two ZFS panics

[zfs-discuss] two ZFS panics

[zfs-discuss] two ZFS panics

[zfs-discuss] two ZFS panics

[zfs-discuss] two ZFS panics

[zfs-discuss] two ZFS panics

[zfs-discuss] two ZFS panics

[zfs-discuss] two ZFS panics

[zfs-discuss] two ZFS panics