thr3ads.net - zfs discuss - [zfs-discuss] Panic on Zpool Import (Urgent) [Jan 2008]

If this information is useful, please help other people find it:
Share via:

Ben Rockwood

2008-Jan-13 07:15 UTC

[zfs-discuss] Panic on Zpool Import (Urgent)

Today, suddenly, without any apparent reason that I can find, I''m 
getting panic''s during zpool import.  The system paniced earlier today 
and has been suffering since.  This is snv_43 on a thumper.  Here''s the
stack:

panic[cpu0]/thread=ffffffff99adbac0: assertion failed: ss != NULL, file: 
../../common/fs/zfs/space_map.c, line: 145

fffffe8000a240a0 genunix:assfail+83 ()
fffffe8000a24130 zfs:space_map_remove+1d6 ()
fffffe8000a24180 zfs:space_map_claim+49 ()
fffffe8000a241e0 zfs:metaslab_claim_dva+130 ()
fffffe8000a24240 zfs:metaslab_claim+94 ()
fffffe8000a24270 zfs:zio_dva_claim+27 ()
fffffe8000a24290 zfs:zio_next_stage+6b ()
fffffe8000a242b0 zfs:zio_gang_pipeline+33 ()
fffffe8000a242d0 zfs:zio_next_stage+6b ()
fffffe8000a24320 zfs:zio_wait_for_children+67 ()
fffffe8000a24340 zfs:zio_wait_children_ready+22 ()
fffffe8000a24360 zfs:zio_next_stage_async+c9 ()
fffffe8000a243a0 zfs:zio_wait+33 ()
fffffe8000a243f0 zfs:zil_claim_log_block+69 ()
fffffe8000a24520 zfs:zil_parse+ec ()
fffffe8000a24570 zfs:zil_claim+9a ()
fffffe8000a24750 zfs:dmu_objset_find+2cc ()
fffffe8000a24930 zfs:dmu_objset_find+fc ()
fffffe8000a24b10 zfs:dmu_objset_find+fc ()
fffffe8000a24bb0 zfs:spa_load+67b ()
fffffe8000a24c20 zfs:spa_import+a0 ()
fffffe8000a24c60 zfs:zfs_ioc_pool_import+79 ()
fffffe8000a24ce0 zfs:zfsdev_ioctl+135 ()
fffffe8000a24d20 genunix:cdev_ioctl+55 ()
fffffe8000a24d60 specfs:spec_ioctl+99 ()
fffffe8000a24dc0 genunix:fop_ioctl+3b ()
fffffe8000a24ec0 genunix:ioctl+180 ()
fffffe8000a24f10 unix:sys_syscall32+101 ()

syncing file systems... done

This is almost identical to a post to this list over a year ago titled 
"ZFS Panic".  There was follow up on it but the results
didn''t make it
back to the list.

I spent time doing a full sweep for any hardware failures, pulled 2 
drives that I suspected as problematic but weren''t flagged as such,
etc,
etc, etc.  Nothing helps.

Bill suggested a ''zpool import -o ro'' on the other post, but
thats not
working either.

I _can_ use ''zpool import'' to see the pool, but I have to
force the
import.  A simple ''zpool import'' returns output in about a
minute.
''zpool import -f poolname'' takes almost exactly 10 minutes
every single
time, like it hits some timeout and then panics.

I did notice that while the ''zpool import'' is running
''iostat'' is
useless, just hangs.  I still want to believe this is some device 
misbehaving but I have no evidence to support that theory.

Any and all suggestions are greatly appreciated.  I''ve put around 8 
hours into this so far and I''m getting absolutely nowhere.

Thanks

benr.

Prabahar Jeyaram

2008-Jan-13 16:34 UTC

head link

[zfs-discuss] Panic on Zpool Import (Urgent)

Your system seems to have hit the BUG 6458218 :

http://bugs.opensolaris.org/view_bug.do?bug_id=6458218

It is fixed in snv_60. As far ZFS, snv_43 is quite old.

--
Prabahar.

On Jan 12, 2008, at 11:15 PM, Ben Rockwood wrote:
> Today, suddenly, without any apparent reason that I can find, I''m
> getting panic''s during zpool import.  The system paniced earlier
today
> and has been suffering since.  This is snv_43 on a thumper. 
Here''s
> the
> stack:
>
> panic[cpu0]/thread=ffffffff99adbac0: assertion failed: ss != NULL,  
> file:
> ../../common/fs/zfs/space_map.c, line: 145
>
> fffffe8000a240a0 genunix:assfail+83 ()
> fffffe8000a24130 zfs:space_map_remove+1d6 ()
> fffffe8000a24180 zfs:space_map_claim+49 ()
> fffffe8000a241e0 zfs:metaslab_claim_dva+130 ()
> fffffe8000a24240 zfs:metaslab_claim+94 ()
> fffffe8000a24270 zfs:zio_dva_claim+27 ()
> fffffe8000a24290 zfs:zio_next_stage+6b ()
> fffffe8000a242b0 zfs:zio_gang_pipeline+33 ()
> fffffe8000a242d0 zfs:zio_next_stage+6b ()
> fffffe8000a24320 zfs:zio_wait_for_children+67 ()
> fffffe8000a24340 zfs:zio_wait_children_ready+22 ()
> fffffe8000a24360 zfs:zio_next_stage_async+c9 ()
> fffffe8000a243a0 zfs:zio_wait+33 ()
> fffffe8000a243f0 zfs:zil_claim_log_block+69 ()
> fffffe8000a24520 zfs:zil_parse+ec ()
> fffffe8000a24570 zfs:zil_claim+9a ()
> fffffe8000a24750 zfs:dmu_objset_find+2cc ()
> fffffe8000a24930 zfs:dmu_objset_find+fc ()
> fffffe8000a24b10 zfs:dmu_objset_find+fc ()
> fffffe8000a24bb0 zfs:spa_load+67b ()
> fffffe8000a24c20 zfs:spa_import+a0 ()
> fffffe8000a24c60 zfs:zfs_ioc_pool_import+79 ()
> fffffe8000a24ce0 zfs:zfsdev_ioctl+135 ()
> fffffe8000a24d20 genunix:cdev_ioctl+55 ()
> fffffe8000a24d60 specfs:spec_ioctl+99 ()
> fffffe8000a24dc0 genunix:fop_ioctl+3b ()
> fffffe8000a24ec0 genunix:ioctl+180 ()
> fffffe8000a24f10 unix:sys_syscall32+101 ()
>
> syncing file systems... done
>
> This is almost identical to a post to this list over a year ago titled
> "ZFS Panic".  There was follow up on it but the results
didn''t make it
> back to the list.
>
> I spent time doing a full sweep for any hardware failures, pulled 2
> drives that I suspected as problematic but weren''t flagged as
such,
> etc,
> etc, etc.  Nothing helps.
>
> Bill suggested a ''zpool import -o ro'' on the other post,
but thats not
> working either.
>
> I _can_ use ''zpool import'' to see the pool, but I have to
force the
> import.  A simple ''zpool import'' returns output in about
a minute.
> ''zpool import -f poolname'' takes almost exactly 10
minutes every
> single
> time, like it hits some timeout and then panics.
>
> I did notice that while the ''zpool import'' is running
''iostat'' is
> useless, just hangs.  I still want to believe this is some device
> misbehaving but I have no evidence to support that theory.
>
> Any and all suggestions are greatly appreciated.  I''ve put around
8
> hours into this so far and I''m getting absolutely nowhere.
>
> Thanks
>
> benr.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Akhilesh Mritunjai

2008-Jan-13 17:11 UTC

head link

[zfs-discuss] Panic on Zpool Import (Urgent)

Hi Ben

Not that I know much, but while monitoring the posts I read sometime long ago
that there was a bug/race condition in slab allocator which results in panic on
double free (ss != NULL).

I think zpool is fine but your system is tripping on this bug. Since it is
snv43, I''d suggest upgrading. Is LU/fresh install possible ? Can you
quickly try importing it on belenix liveCD/USB ?

- Akhilesh

PS: I''ll post the bug# if I find it.
 
 
This message posted from opensolaris.org

Akhilesh Mritunjai

2008-Jan-13 17:18 UTC

head link

[zfs-discuss] Panic on Zpool Import (Urgent)

Most probable culprit (close, but not identical stacktrace):

http://bugs.opensolaris.org/view_bug.do?bug_id=6458218

Fixed since snv60.
 
 
This message posted from opensolaris.org

Rob Logan

2008-Jan-13 17:34 UTC

head link

[zfs-discuss] Panic on Zpool Import (Urgent)

as its been pointed out it likely 6458218
but a zdb -e poolname
will tell you alittle more

			Rob

Ben Rockwood

2008-Jan-17 19:59 UTC

head link

[zfs-discuss] Panic on Zpool Import (Urgent)

The solution here was to upgrade to snv_78.  By "upgrade" I mean 
re-jumpstart the system.

I tested snv_67 via net-boot but the pool paniced just as below.  I also 
attempted using zfs_recover without success.

I then tested snv_78 via net-boot, used both "aok=1" and 
"zfs:zfs_recover=1" and was able to (slowly) import the pool. 
Following
that test I exported and then did a full re-install of the box.

A very important note to anyone upgrading a Thumper!  Don''t forget
about
the NCQ bug.  After upgrading to a release more recent than snv_60 add 
the following to /etc/system:

set sata:sata_max_queue_depth = 0x1

If you don''t life will be highly unpleasant and you''ll believe
that disks are failing everywhere when in fact they are not.

benr.




Ben Rockwood wrote:> Today, suddenly, without any apparent reason that I can find, I''m 
> getting panic''s during zpool import.  The system paniced earlier
today
> and has been suffering since.  This is snv_43 on a thumper. 
Here''s the
> stack:
>
> panic[cpu0]/thread=ffffffff99adbac0: assertion failed: ss != NULL, file: 
> ../../common/fs/zfs/space_map.c, line: 145
>
> fffffe8000a240a0 genunix:assfail+83 ()
> fffffe8000a24130 zfs:space_map_remove+1d6 ()
> fffffe8000a24180 zfs:space_map_claim+49 ()
> fffffe8000a241e0 zfs:metaslab_claim_dva+130 ()
> fffffe8000a24240 zfs:metaslab_claim+94 ()
> fffffe8000a24270 zfs:zio_dva_claim+27 ()
> fffffe8000a24290 zfs:zio_next_stage+6b ()
> fffffe8000a242b0 zfs:zio_gang_pipeline+33 ()
> fffffe8000a242d0 zfs:zio_next_stage+6b ()
> fffffe8000a24320 zfs:zio_wait_for_children+67 ()
> fffffe8000a24340 zfs:zio_wait_children_ready+22 ()
> fffffe8000a24360 zfs:zio_next_stage_async+c9 ()
> fffffe8000a243a0 zfs:zio_wait+33 ()
> fffffe8000a243f0 zfs:zil_claim_log_block+69 ()
> fffffe8000a24520 zfs:zil_parse+ec ()
> fffffe8000a24570 zfs:zil_claim+9a ()
> fffffe8000a24750 zfs:dmu_objset_find+2cc ()
> fffffe8000a24930 zfs:dmu_objset_find+fc ()
> fffffe8000a24b10 zfs:dmu_objset_find+fc ()
> fffffe8000a24bb0 zfs:spa_load+67b ()
> fffffe8000a24c20 zfs:spa_import+a0 ()
> fffffe8000a24c60 zfs:zfs_ioc_pool_import+79 ()
> fffffe8000a24ce0 zfs:zfsdev_ioctl+135 ()
> fffffe8000a24d20 genunix:cdev_ioctl+55 ()
> fffffe8000a24d60 specfs:spec_ioctl+99 ()
> fffffe8000a24dc0 genunix:fop_ioctl+3b ()
> fffffe8000a24ec0 genunix:ioctl+180 ()
> fffffe8000a24f10 unix:sys_syscall32+101 ()
>
> syncing file systems... done
>
> This is almost identical to a post to this list over a year ago titled 
> "ZFS Panic".  There was follow up on it but the results
didn''t make it
> back to the list.
>
> I spent time doing a full sweep for any hardware failures, pulled 2 
> drives that I suspected as problematic but weren''t flagged as
such, etc,
> etc, etc.  Nothing helps.
>
> Bill suggested a ''zpool import -o ro'' on the other post,
but thats not
> working either.
>
> I _can_ use ''zpool import'' to see the pool, but I have to
force the
> import.  A simple ''zpool import'' returns output in about
a minute.
> ''zpool import -f poolname'' takes almost exactly 10
minutes every single
> time, like it hits some timeout and then panics.
>
> I did notice that while the ''zpool import'' is running
''iostat'' is
> useless, just hangs.  I still want to believe this is some device 
> misbehaving but I have no evidence to support that theory.
>
> Any and all suggestions are greatly appreciated.  I''ve put around
8
> hours into this so far and I''m getting absolutely nowhere.
>
> Thanks
>
> benr.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

zfs discuss - Jan 2008 - Panic on Zpool Import (Urgent)

[zfs-discuss] Panic on Zpool Import (Urgent)

[zfs-discuss] Panic on Zpool Import (Urgent)

[zfs-discuss] Panic on Zpool Import (Urgent)

[zfs-discuss] Panic on Zpool Import (Urgent)

[zfs-discuss] Panic on Zpool Import (Urgent)

[zfs-discuss] Panic on Zpool Import (Urgent)