Today, suddenly, without any apparent reason that I can find, I''m getting panic''s during zpool import. The system paniced earlier today and has been suffering since. This is snv_43 on a thumper. Here''s the stack: panic[cpu0]/thread=ffffffff99adbac0: assertion failed: ss != NULL, file: ../../common/fs/zfs/space_map.c, line: 145 fffffe8000a240a0 genunix:assfail+83 () fffffe8000a24130 zfs:space_map_remove+1d6 () fffffe8000a24180 zfs:space_map_claim+49 () fffffe8000a241e0 zfs:metaslab_claim_dva+130 () fffffe8000a24240 zfs:metaslab_claim+94 () fffffe8000a24270 zfs:zio_dva_claim+27 () fffffe8000a24290 zfs:zio_next_stage+6b () fffffe8000a242b0 zfs:zio_gang_pipeline+33 () fffffe8000a242d0 zfs:zio_next_stage+6b () fffffe8000a24320 zfs:zio_wait_for_children+67 () fffffe8000a24340 zfs:zio_wait_children_ready+22 () fffffe8000a24360 zfs:zio_next_stage_async+c9 () fffffe8000a243a0 zfs:zio_wait+33 () fffffe8000a243f0 zfs:zil_claim_log_block+69 () fffffe8000a24520 zfs:zil_parse+ec () fffffe8000a24570 zfs:zil_claim+9a () fffffe8000a24750 zfs:dmu_objset_find+2cc () fffffe8000a24930 zfs:dmu_objset_find+fc () fffffe8000a24b10 zfs:dmu_objset_find+fc () fffffe8000a24bb0 zfs:spa_load+67b () fffffe8000a24c20 zfs:spa_import+a0 () fffffe8000a24c60 zfs:zfs_ioc_pool_import+79 () fffffe8000a24ce0 zfs:zfsdev_ioctl+135 () fffffe8000a24d20 genunix:cdev_ioctl+55 () fffffe8000a24d60 specfs:spec_ioctl+99 () fffffe8000a24dc0 genunix:fop_ioctl+3b () fffffe8000a24ec0 genunix:ioctl+180 () fffffe8000a24f10 unix:sys_syscall32+101 () syncing file systems... done This is almost identical to a post to this list over a year ago titled "ZFS Panic". There was follow up on it but the results didn''t make it back to the list. I spent time doing a full sweep for any hardware failures, pulled 2 drives that I suspected as problematic but weren''t flagged as such, etc, etc, etc. Nothing helps. Bill suggested a ''zpool import -o ro'' on the other post, but thats not working either. I _can_ use ''zpool import'' to see the pool, but I have to force the import. A simple ''zpool import'' returns output in about a minute. ''zpool import -f poolname'' takes almost exactly 10 minutes every single time, like it hits some timeout and then panics. I did notice that while the ''zpool import'' is running ''iostat'' is useless, just hangs. I still want to believe this is some device misbehaving but I have no evidence to support that theory. Any and all suggestions are greatly appreciated. I''ve put around 8 hours into this so far and I''m getting absolutely nowhere. Thanks benr.
Your system seems to have hit the BUG 6458218 : http://bugs.opensolaris.org/view_bug.do?bug_id=6458218 It is fixed in snv_60. As far ZFS, snv_43 is quite old. -- Prabahar. On Jan 12, 2008, at 11:15 PM, Ben Rockwood wrote:> Today, suddenly, without any apparent reason that I can find, I''m > getting panic''s during zpool import. The system paniced earlier today > and has been suffering since. This is snv_43 on a thumper. Here''s > the > stack: > > panic[cpu0]/thread=ffffffff99adbac0: assertion failed: ss != NULL, > file: > ../../common/fs/zfs/space_map.c, line: 145 > > fffffe8000a240a0 genunix:assfail+83 () > fffffe8000a24130 zfs:space_map_remove+1d6 () > fffffe8000a24180 zfs:space_map_claim+49 () > fffffe8000a241e0 zfs:metaslab_claim_dva+130 () > fffffe8000a24240 zfs:metaslab_claim+94 () > fffffe8000a24270 zfs:zio_dva_claim+27 () > fffffe8000a24290 zfs:zio_next_stage+6b () > fffffe8000a242b0 zfs:zio_gang_pipeline+33 () > fffffe8000a242d0 zfs:zio_next_stage+6b () > fffffe8000a24320 zfs:zio_wait_for_children+67 () > fffffe8000a24340 zfs:zio_wait_children_ready+22 () > fffffe8000a24360 zfs:zio_next_stage_async+c9 () > fffffe8000a243a0 zfs:zio_wait+33 () > fffffe8000a243f0 zfs:zil_claim_log_block+69 () > fffffe8000a24520 zfs:zil_parse+ec () > fffffe8000a24570 zfs:zil_claim+9a () > fffffe8000a24750 zfs:dmu_objset_find+2cc () > fffffe8000a24930 zfs:dmu_objset_find+fc () > fffffe8000a24b10 zfs:dmu_objset_find+fc () > fffffe8000a24bb0 zfs:spa_load+67b () > fffffe8000a24c20 zfs:spa_import+a0 () > fffffe8000a24c60 zfs:zfs_ioc_pool_import+79 () > fffffe8000a24ce0 zfs:zfsdev_ioctl+135 () > fffffe8000a24d20 genunix:cdev_ioctl+55 () > fffffe8000a24d60 specfs:spec_ioctl+99 () > fffffe8000a24dc0 genunix:fop_ioctl+3b () > fffffe8000a24ec0 genunix:ioctl+180 () > fffffe8000a24f10 unix:sys_syscall32+101 () > > syncing file systems... done > > This is almost identical to a post to this list over a year ago titled > "ZFS Panic". There was follow up on it but the results didn''t make it > back to the list. > > I spent time doing a full sweep for any hardware failures, pulled 2 > drives that I suspected as problematic but weren''t flagged as such, > etc, > etc, etc. Nothing helps. > > Bill suggested a ''zpool import -o ro'' on the other post, but thats not > working either. > > I _can_ use ''zpool import'' to see the pool, but I have to force the > import. A simple ''zpool import'' returns output in about a minute. > ''zpool import -f poolname'' takes almost exactly 10 minutes every > single > time, like it hits some timeout and then panics. > > I did notice that while the ''zpool import'' is running ''iostat'' is > useless, just hangs. I still want to believe this is some device > misbehaving but I have no evidence to support that theory. > > Any and all suggestions are greatly appreciated. I''ve put around 8 > hours into this so far and I''m getting absolutely nowhere. > > Thanks > > benr. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hi Ben Not that I know much, but while monitoring the posts I read sometime long ago that there was a bug/race condition in slab allocator which results in panic on double free (ss != NULL). I think zpool is fine but your system is tripping on this bug. Since it is snv43, I''d suggest upgrading. Is LU/fresh install possible ? Can you quickly try importing it on belenix liveCD/USB ? - Akhilesh PS: I''ll post the bug# if I find it. This message posted from opensolaris.org
Most probable culprit (close, but not identical stacktrace): http://bugs.opensolaris.org/view_bug.do?bug_id=6458218 Fixed since snv60. This message posted from opensolaris.org
as its been pointed out it likely 6458218 but a zdb -e poolname will tell you alittle more Rob
The solution here was to upgrade to snv_78. By "upgrade" I mean re-jumpstart the system. I tested snv_67 via net-boot but the pool paniced just as below. I also attempted using zfs_recover without success. I then tested snv_78 via net-boot, used both "aok=1" and "zfs:zfs_recover=1" and was able to (slowly) import the pool. Following that test I exported and then did a full re-install of the box. A very important note to anyone upgrading a Thumper! Don''t forget about the NCQ bug. After upgrading to a release more recent than snv_60 add the following to /etc/system: set sata:sata_max_queue_depth = 0x1 If you don''t life will be highly unpleasant and you''ll believe that disks are failing everywhere when in fact they are not. benr. Ben Rockwood wrote:> Today, suddenly, without any apparent reason that I can find, I''m > getting panic''s during zpool import. The system paniced earlier today > and has been suffering since. This is snv_43 on a thumper. Here''s the > stack: > > panic[cpu0]/thread=ffffffff99adbac0: assertion failed: ss != NULL, file: > ../../common/fs/zfs/space_map.c, line: 145 > > fffffe8000a240a0 genunix:assfail+83 () > fffffe8000a24130 zfs:space_map_remove+1d6 () > fffffe8000a24180 zfs:space_map_claim+49 () > fffffe8000a241e0 zfs:metaslab_claim_dva+130 () > fffffe8000a24240 zfs:metaslab_claim+94 () > fffffe8000a24270 zfs:zio_dva_claim+27 () > fffffe8000a24290 zfs:zio_next_stage+6b () > fffffe8000a242b0 zfs:zio_gang_pipeline+33 () > fffffe8000a242d0 zfs:zio_next_stage+6b () > fffffe8000a24320 zfs:zio_wait_for_children+67 () > fffffe8000a24340 zfs:zio_wait_children_ready+22 () > fffffe8000a24360 zfs:zio_next_stage_async+c9 () > fffffe8000a243a0 zfs:zio_wait+33 () > fffffe8000a243f0 zfs:zil_claim_log_block+69 () > fffffe8000a24520 zfs:zil_parse+ec () > fffffe8000a24570 zfs:zil_claim+9a () > fffffe8000a24750 zfs:dmu_objset_find+2cc () > fffffe8000a24930 zfs:dmu_objset_find+fc () > fffffe8000a24b10 zfs:dmu_objset_find+fc () > fffffe8000a24bb0 zfs:spa_load+67b () > fffffe8000a24c20 zfs:spa_import+a0 () > fffffe8000a24c60 zfs:zfs_ioc_pool_import+79 () > fffffe8000a24ce0 zfs:zfsdev_ioctl+135 () > fffffe8000a24d20 genunix:cdev_ioctl+55 () > fffffe8000a24d60 specfs:spec_ioctl+99 () > fffffe8000a24dc0 genunix:fop_ioctl+3b () > fffffe8000a24ec0 genunix:ioctl+180 () > fffffe8000a24f10 unix:sys_syscall32+101 () > > syncing file systems... done > > This is almost identical to a post to this list over a year ago titled > "ZFS Panic". There was follow up on it but the results didn''t make it > back to the list. > > I spent time doing a full sweep for any hardware failures, pulled 2 > drives that I suspected as problematic but weren''t flagged as such, etc, > etc, etc. Nothing helps. > > Bill suggested a ''zpool import -o ro'' on the other post, but thats not > working either. > > I _can_ use ''zpool import'' to see the pool, but I have to force the > import. A simple ''zpool import'' returns output in about a minute. > ''zpool import -f poolname'' takes almost exactly 10 minutes every single > time, like it hits some timeout and then panics. > > I did notice that while the ''zpool import'' is running ''iostat'' is > useless, just hangs. I still want to believe this is some device > misbehaving but I have no evidence to support that theory. > > Any and all suggestions are greatly appreciated. I''ve put around 8 > hours into this so far and I''m getting absolutely nowhere. > > Thanks > > benr. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >