Mark
2010-Mar-15 21:20 UTC
[zfs-discuss] pool causes kernel panic, recursive mutex enter, 134
hi, i?m using opensolaris about 2 years with an mirrored rpool and an data pool with 3 x 2 (mirrored) drives. the data pool drives are connected to SIL pci-express cards. yesterday i updated from 130 to 134, everything seemed to be fine and i also replaced 1 pair of mirrored drives with larger disks. still no problems, done some tests, rebooted a few times, checked logs, nothing special. today i started copying a larger amount of data. while copying, at about 40gb, opensolaris gave me the first kernel panic ever seen on this system. system rebooted and while mounting the data pool, you may guess it, panic again. what i did so far in trying to get it up again: boot without data drive, try to mount manualy and with -F -n (non destructive as manual says) tried to mount normal with different combination of mirrors taken offline, so that there is only a single drive for each slice. same panic. i still have the drives that i replaced with the newer drives but i believe they are useless since the structure changed? the kernel panic i get is cpu(0) recursive mutex enter and several lines of SIL driver errors. i tried also booting with previous BE 130 before the update and where the pools never got an error, same panic. ANY ideas of volume rescue are welcome - if i missed some important information,please tell me. regards, mark -- This message posted from opensolaris.org
Mark
2010-Mar-15 21:52 UTC
[zfs-discuss] pool causes kernel panic, recursive mutex enter, 134
some screenshots that may help:
pool: tank
id: 5649976080828524375
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
data ONLINE
mirror-0 ONLINE
c27t2d0 ONLINE
c27t0d0 ONLINE
mirror-1 ONLINE
c27t3d0 ONLINE
c29t1d0 ONLINE
mirror-2 ONLINE
c27t1d0 ONLINE
c29t0d0 ONLINE
----
Mar 15 21:42:50 solaris1.local ^Mpanic[cpu0]/thread=d6792f00:
Mar 15 21:42:50 solaris1.local genunix: [ID 335743 kern.notice] BAD TRAP: type=e
(#pf Page fault) rp=d76d3658 addr=34 occurred in module "zfs" due to a
NULL pointer dereference
Mar 15 21:42:50 solaris1.local unix: [ID 100000 kern.notice]
Mar 15 21:42:50 solaris1.local unix: [ID 839527 kern.notice] syseventd:
Mar 15 21:42:50 solaris1.local unix: [ID 753105 kern.notice] #pf Page fault
Mar 15 21:42:50 solaris1.local unix: [ID 532287 kern.notice] Bad kernel fault at
addr=0x34
Mar 15 21:42:50 solaris1.local unix: [ID 243837 kern.notice] pid=93,
pc=0xf924b97e, sp=0xd76d36c4, eflags=0x10282
Mar 15 21:42:50 solaris1.local unix: [ID 211416 kern.notice] cr0:
8005003b<pg,wp,ne,et,ts,mp,pe> cr4:
6f8<xmme,fxsr,pge,mce,pae,pse,de>
Mar 15 21:42:50 solaris1.local unix: [ID 624947 kern.notice] cr2: 34
Mar 15 21:42:50 solaris1.local unix: [ID 625075 kern.notice] cr3: 2ead020
Mar 15 21:42:50 solaris1.local unix: [ID 100000 kern.notice]
Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice] gs: d76d01b0
fs: 0 es: cb0160 ds: e31a0160
Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice] edi: 0
esi: de581350 ebp: d76d36a4 esp: d76d3690
Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice] ebx: 0
edx: b ecx: 0 eax: 0
Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice] trp: e
err: 0 eip: f924b97e cs: 158
Mar 15 21:42:50 solaris1.local unix: [ID 717149 kern.notice] efl: 10282
usp: d76d36c4 ss: f924b9c6
Mar 15 21:42:50 solaris1.local unix: [ID 100000 kern.notice]
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3594
unix:die+93 (e, d76d3658, 34, 0)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3644
unix:trap+1449 (d76d3658, 34, 0)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3658
unix:cmntrap+7c (d76d01b0, 0, cb0160)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d36a4
zfs:vdev_is_dead+6 (0, 0, cb36a7, e31ad)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d36c4
zfs:vdev_readable+e (0, 1, 0, fe96c13d)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3704
zfs:vdev_mirror_child_select+55 (dedc6560, 1, 0, f92)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3744
zfs:vdev_mirror_io_start+b3 (dedc6560, 10, dedc6)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3784
zfs:zio_vdev_io_start+1ea (dedc6560, 2, 1f8002)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d37a4
zfs:zio_execute+76 (dedc6560, d341c000,)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d37c4
zfs:zio_nowait+42 (dedc6560, 40, d76d3)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3814
zfs:arc_read_nolock+85d (d360f598, d341c000,)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3854
zfs:arc_read+39 (d360f598, d341c000,)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d38d4
zfs:dbuf_read_impl+155 (de773450, d360f598,)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3914
zfs:dbuf_read+1cc (de773450, 0, e, f92)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3964
zfs:dbuf_findbp+114 (de3d6408, 0, 0, 0, )
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d39c4
zfs:dbuf_hold_impl+61 (de3d6408, 0, 0, 0, )
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3a04
zfs:dbuf_hold+20 (de3d6408, 0, 0, f92)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3a54
zfs:dnode_hold_impl+bb (da8cfe00, 1, 0, 1, )
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3a84
zfs:dnode_hold+1d (da8cfe00, 1, 0, f92)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3ac4
zfs:dmu_buf_hold+20 (da8cfe00, 1, 0, 0, )
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3b14
zfs:zap_lockdir+33 (da8cfe00, 1, 0, 0, )
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3b64
zfs:zap_lookup_norm+24 (da8cfe00, 1, 0, f92)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3bb4
zfs:zap_lookup+31 (da8cfe00, 1, 0, f92)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3c14
zfs:dsl_pool_open+70 (d341c000, 43079, 0,)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3c94
zfs:spa_load+507 (d341c000, 1, 0, f92)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3ce4
zfs:spa_load_best+61 (d341c000, 1, 0, fff)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3d54
zfs:spa_open_common+12a (dd53e000, d76d3d88,)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3da4
zfs:spa_get_stats+27 (dd53e000, d76d3dc8,)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3dd4
zfs:zfs_ioc_pool_stats+21 (dd53e000, 0, 0, 100)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3e14
zfs:zfsdev_ioctl+14b (2d80000, 5a05, cf97)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3e44
genunix:cdev_ioctl+31 (2d80000, 5a05, cf97)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3e74
specfs:spec_ioctl+52 (d7659d40, 5a05, cf9)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3ec4
genunix:fop_ioctl+49 (d7659d40, 5a05, cf9)
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3f84
genunix:ioctl+171 (5, 5a05, cf97d040, )
Mar 15 21:42:50 solaris1.local unix: [ID 100000 kern.notice]
Mar 15 21:42:50 solaris1.local genunix: [ID 672855 kern.notice] syncing file
systems...
Mar 15 21:42:50 solaris1.local genunix: [ID 904073 kern.notice] done
Mar 15 21:42:51 solaris1.local genunix: [ID 111219 kern.notice] dumping to
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Mar 15 21:43:00 solaris1.local genunix: [ID 100000 kern.notice]
Mar 15 21:43:00 solaris1.local genunix: [ID 665016 kern.notice] ^M100% done:
58723 pages dumped,
Mar 15 21:43:00 solaris1.local genunix: [ID 851671 kern.notice] dump succeeded
--
This message posted from opensolaris.org