Mark
2010-Mar-15 21:20 UTC
[zfs-discuss] pool causes kernel panic, recursive mutex enter, 134
hi, i?m using opensolaris about 2 years with an mirrored rpool and an data pool with 3 x 2 (mirrored) drives. the data pool drives are connected to SIL pci-express cards. yesterday i updated from 130 to 134, everything seemed to be fine and i also replaced 1 pair of mirrored drives with larger disks. still no problems, done some tests, rebooted a few times, checked logs, nothing special. today i started copying a larger amount of data. while copying, at about 40gb, opensolaris gave me the first kernel panic ever seen on this system. system rebooted and while mounting the data pool, you may guess it, panic again. what i did so far in trying to get it up again: boot without data drive, try to mount manualy and with -F -n (non destructive as manual says) tried to mount normal with different combination of mirrors taken offline, so that there is only a single drive for each slice. same panic. i still have the drives that i replaced with the newer drives but i believe they are useless since the structure changed? the kernel panic i get is cpu(0) recursive mutex enter and several lines of SIL driver errors. i tried also booting with previous BE 130 before the update and where the pools never got an error, same panic. ANY ideas of volume rescue are welcome - if i missed some important information,please tell me. regards, mark -- This message posted from opensolaris.org
Mark
2010-Mar-15 21:52 UTC
[zfs-discuss] pool causes kernel panic, recursive mutex enter, 134
some screenshots that may help: pool: tank id: 5649976080828524375 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: data ONLINE mirror-0 ONLINE c27t2d0 ONLINE c27t0d0 ONLINE mirror-1 ONLINE c27t3d0 ONLINE c29t1d0 ONLINE mirror-2 ONLINE c27t1d0 ONLINE c29t0d0 ONLINE ---- Mar 15 21:42:50 solaris1.local ^Mpanic[cpu0]/thread=d6792f00: Mar 15 21:42:50 solaris1.local genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=d76d3658 addr=34 occurred in module "zfs" due to a NULL pointer dereference Mar 15 21:42:50 solaris1.local unix: [ID 100000 kern.notice] Mar 15 21:42:50 solaris1.local unix: [ID 839527 kern.notice] syseventd: Mar 15 21:42:50 solaris1.local unix: [ID 753105 kern.notice] #pf Page fault Mar 15 21:42:50 solaris1.local unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x34 Mar 15 21:42:50 solaris1.local unix: [ID 243837 kern.notice] pid=93, pc=0xf924b97e, sp=0xd76d36c4, eflags=0x10282 Mar 15 21:42:50 solaris1.local unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> Mar 15 21:42:50 solaris1.local unix: [ID 624947 kern.notice] cr2: 34 Mar 15 21:42:50 solaris1.local unix: [ID 625075 kern.notice] cr3: 2ead020 Mar 15 21:42:50 solaris1.local unix: [ID 100000 kern.notice] Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice] gs: d76d01b0 fs: 0 es: cb0160 ds: e31a0160 Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice] edi: 0 esi: de581350 ebp: d76d36a4 esp: d76d3690 Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice] ebx: 0 edx: b ecx: 0 eax: 0 Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice] trp: e err: 0 eip: f924b97e cs: 158 Mar 15 21:42:50 solaris1.local unix: [ID 717149 kern.notice] efl: 10282 usp: d76d36c4 ss: f924b9c6 Mar 15 21:42:50 solaris1.local unix: [ID 100000 kern.notice] Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3594 unix:die+93 (e, d76d3658, 34, 0) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3644 unix:trap+1449 (d76d3658, 34, 0) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3658 unix:cmntrap+7c (d76d01b0, 0, cb0160) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d36a4 zfs:vdev_is_dead+6 (0, 0, cb36a7, e31ad) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d36c4 zfs:vdev_readable+e (0, 1, 0, fe96c13d) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3704 zfs:vdev_mirror_child_select+55 (dedc6560, 1, 0, f92) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3744 zfs:vdev_mirror_io_start+b3 (dedc6560, 10, dedc6) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3784 zfs:zio_vdev_io_start+1ea (dedc6560, 2, 1f8002) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d37a4 zfs:zio_execute+76 (dedc6560, d341c000,) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d37c4 zfs:zio_nowait+42 (dedc6560, 40, d76d3) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3814 zfs:arc_read_nolock+85d (d360f598, d341c000,) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3854 zfs:arc_read+39 (d360f598, d341c000,) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d38d4 zfs:dbuf_read_impl+155 (de773450, d360f598,) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3914 zfs:dbuf_read+1cc (de773450, 0, e, f92) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3964 zfs:dbuf_findbp+114 (de3d6408, 0, 0, 0, ) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d39c4 zfs:dbuf_hold_impl+61 (de3d6408, 0, 0, 0, ) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3a04 zfs:dbuf_hold+20 (de3d6408, 0, 0, f92) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3a54 zfs:dnode_hold_impl+bb (da8cfe00, 1, 0, 1, ) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3a84 zfs:dnode_hold+1d (da8cfe00, 1, 0, f92) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3ac4 zfs:dmu_buf_hold+20 (da8cfe00, 1, 0, 0, ) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3b14 zfs:zap_lockdir+33 (da8cfe00, 1, 0, 0, ) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3b64 zfs:zap_lookup_norm+24 (da8cfe00, 1, 0, f92) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3bb4 zfs:zap_lookup+31 (da8cfe00, 1, 0, f92) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3c14 zfs:dsl_pool_open+70 (d341c000, 43079, 0,) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3c94 zfs:spa_load+507 (d341c000, 1, 0, f92) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3ce4 zfs:spa_load_best+61 (d341c000, 1, 0, fff) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3d54 zfs:spa_open_common+12a (dd53e000, d76d3d88,) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3da4 zfs:spa_get_stats+27 (dd53e000, d76d3dc8,) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3dd4 zfs:zfs_ioc_pool_stats+21 (dd53e000, 0, 0, 100) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3e14 zfs:zfsdev_ioctl+14b (2d80000, 5a05, cf97) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3e44 genunix:cdev_ioctl+31 (2d80000, 5a05, cf97) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3e74 specfs:spec_ioctl+52 (d7659d40, 5a05, cf9) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3ec4 genunix:fop_ioctl+49 (d7659d40, 5a05, cf9) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3f84 genunix:ioctl+171 (5, 5a05, cf97d040, ) Mar 15 21:42:50 solaris1.local unix: [ID 100000 kern.notice] Mar 15 21:42:50 solaris1.local genunix: [ID 672855 kern.notice] syncing file systems... Mar 15 21:42:50 solaris1.local genunix: [ID 904073 kern.notice] done Mar 15 21:42:51 solaris1.local genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel Mar 15 21:43:00 solaris1.local genunix: [ID 100000 kern.notice] Mar 15 21:43:00 solaris1.local genunix: [ID 665016 kern.notice] ^M100% done: 58723 pages dumped, Mar 15 21:43:00 solaris1.local genunix: [ID 851671 kern.notice] dump succeeded -- This message posted from opensolaris.org