thr3ads.net - zfs discuss - [zfs-discuss] ZFS panic in space

If this information is useful, please help other people find it:
Share via:

Matty

2007-Sep-18 00:59 UTC

[zfs-discuss] ZFS panic in space_map.c line 125

One of our Solaris 10 update 3 servers paniced today with the following error:

Sep 18 00:34:53 m2000ef savecore: [ID 570001 auth.error] reboot after
panic: assertion failed: ss != NULL, file:
../../common/fs/zfs/space_map.c, line: 125

The server saved a core file, and the resulting backtrace is listed below:

$ mdb unix.0 vmcore.0> $cvpanic()
0xfffffffffb9b49f3()
space_map_remove+0x239()
space_map_load+0x17d()
metaslab_activate+0x6f()
metaslab_group_alloc+0x187()
metaslab_alloc_dva+0xab()
metaslab_alloc+0x51()
zio_dva_allocate+0x3f()
zio_next_stage+0x72()
zio_checksum_generate+0x5f()
zio_next_stage+0x72()
zio_write_compress+0x136()
zio_next_stage+0x72()
zio_wait_for_children+0x49()
zio_wait_children_ready+0x15()
zio_next_stage_async+0xae()
zio_wait+0x2d()
arc_write+0xcc()
dmu_objset_sync+0x141()
dsl_dataset_sync+0x23()
dsl_pool_sync+0x7b()
spa_sync+0x116()
txg_sync_thread+0x115()
thread_start+8()

It appears ZFS is still able to read the labels from the drive:

$ zdb -lv  /dev/rdsk/c3t50002AC00039040Bd0p0
--------------------------------------------
LABEL 0
--------------------------------------------
    version=3
    name=''fpool0''
    state=0
    txg=4
    pool_guid=10406529929620343615
    top_guid=3365726235666077346
    guid=3365726235666077346
    vdev_tree
        type=''disk''
        id=0
        guid=3365726235666077346
        path=''/dev/dsk/c3t50002AC00039040Bd0p0''
        devid=''id1,sd at n50002ac00039040b/q''
        whole_disk=0
        metaslab_array=13
        metaslab_shift=31
        ashift=9
        asize=322117566464
--------------------------------------------
LABEL 1
--------------------------------------------
    version=3
    name=''fpool0''
    state=0
    txg=4
    pool_guid=10406529929620343615
    top_guid=3365726235666077346
    guid=3365726235666077346
    vdev_tree
        type=''disk''
        id=0
        guid=3365726235666077346
        path=''/dev/dsk/c3t50002AC00039040Bd0p0''
        devid=''id1,sd at n50002ac00039040b/q''
        whole_disk=0
        metaslab_array=13
        metaslab_shift=31
        ashift=9
        asize=322117566464
--------------------------------------------
LABEL 2
--------------------------------------------
    version=3
    name=''fpool0''
    state=0
    txg=4
    pool_guid=10406529929620343615
    top_guid=3365726235666077346
    guid=3365726235666077346
    vdev_tree
        type=''disk''
        id=0
        guid=3365726235666077346
        path=''/dev/dsk/c3t50002AC00039040Bd0p0''
        devid=''id1,sd at n50002ac00039040b/q''
        whole_disk=0
        metaslab_array=13
        metaslab_shift=31
        ashift=9
        asize=322117566464
--------------------------------------------
LABEL 3
--------------------------------------------
    version=3
    name=''fpool0''
    state=0
    txg=4
    pool_guid=10406529929620343615
    top_guid=3365726235666077346
    guid=3365726235666077346
    vdev_tree
        type=''disk''
        id=0
        guid=3365726235666077346
        path=''/dev/dsk/c3t50002AC00039040Bd0p0''
        devid=''id1,sd at n50002ac00039040b/q''
        whole_disk=0
        metaslab_array=13
        metaslab_shift=31
        ashift=9
        asize=322117566464

But for some reason it is unable to open the pool:

$ zdb -c fpool0
zdb: can''t open fpool0: error 2

I saw several bugs related to space_map.c, but the stack traces listed
in the bug reports were different than the one listed above.  Has
anyone seen this bug before? Is there anyway to recover from it?

Thanks for any insight,
- Ryan
-- 
UNIX Administrator
http://prefetch.net

Robin Guo

2007-Sep-18 05:43 UTC

head link

[zfs-discuss] ZFS panic in space_map.c line 125

Hi Matty,

  From the stack I saw, that is 6454482.
But this defect has been marked as ''Not reproducible'', I have
no idea
about how to recover
from it, but looks like new update will not hit this issue.

Matty wrote:> One of our Solaris 10 update 3 servers paniced today with the following
error:
>
> Sep 18 00:34:53 m2000ef savecore: [ID 570001 auth.error] reboot after
> panic: assertion failed: ss != NULL, file:
> ../../common/fs/zfs/space_map.c, line: 125
>
> The server saved a core file, and the resulting backtrace is listed below:
>
> $ mdb unix.0 vmcore.0
>   
>> $c
>>     
> vpanic()
> 0xfffffffffb9b49f3()
> space_map_remove+0x239()
> space_map_load+0x17d()
> metaslab_activate+0x6f()
> metaslab_group_alloc+0x187()
> metaslab_alloc_dva+0xab()
> metaslab_alloc+0x51()
> zio_dva_allocate+0x3f()
> zio_next_stage+0x72()
> zio_checksum_generate+0x5f()
> zio_next_stage+0x72()
> zio_write_compress+0x136()
> zio_next_stage+0x72()
> zio_wait_for_children+0x49()
> zio_wait_children_ready+0x15()
> zio_next_stage_async+0xae()
> zio_wait+0x2d()
> arc_write+0xcc()
> dmu_objset_sync+0x141()
> dsl_dataset_sync+0x23()
> dsl_pool_sync+0x7b()
> spa_sync+0x116()
> txg_sync_thread+0x115()
> thread_start+8()
>
> It appears ZFS is still able to read the labels from the drive:
>
> $ zdb -lv  /dev/rdsk/c3t50002AC00039040Bd0p0
> --------------------------------------------
> LABEL 0
> --------------------------------------------
>     version=3
>     name=''fpool0''
>     state=0
>     txg=4
>     pool_guid=10406529929620343615
>     top_guid=3365726235666077346
>     guid=3365726235666077346
>     vdev_tree
>         type=''disk''
>         id=0
>         guid=3365726235666077346
>         path=''/dev/dsk/c3t50002AC00039040Bd0p0''
>         devid=''id1,sd at n50002ac00039040b/q''
>         whole_disk=0
>         metaslab_array=13
>         metaslab_shift=31
>         ashift=9
>         asize=322117566464
> --------------------------------------------
> LABEL 1
> --------------------------------------------
>     version=3
>     name=''fpool0''
>     state=0
>     txg=4
>     pool_guid=10406529929620343615
>     top_guid=3365726235666077346
>     guid=3365726235666077346
>     vdev_tree
>         type=''disk''
>         id=0
>         guid=3365726235666077346
>         path=''/dev/dsk/c3t50002AC00039040Bd0p0''
>         devid=''id1,sd at n50002ac00039040b/q''
>         whole_disk=0
>         metaslab_array=13
>         metaslab_shift=31
>         ashift=9
>         asize=322117566464
> --------------------------------------------
> LABEL 2
> --------------------------------------------
>     version=3
>     name=''fpool0''
>     state=0
>     txg=4
>     pool_guid=10406529929620343615
>     top_guid=3365726235666077346
>     guid=3365726235666077346
>     vdev_tree
>         type=''disk''
>         id=0
>         guid=3365726235666077346
>         path=''/dev/dsk/c3t50002AC00039040Bd0p0''
>         devid=''id1,sd at n50002ac00039040b/q''
>         whole_disk=0
>         metaslab_array=13
>         metaslab_shift=31
>         ashift=9
>         asize=322117566464
> --------------------------------------------
> LABEL 3
> --------------------------------------------
>     version=3
>     name=''fpool0''
>     state=0
>     txg=4
>     pool_guid=10406529929620343615
>     top_guid=3365726235666077346
>     guid=3365726235666077346
>     vdev_tree
>         type=''disk''
>         id=0
>         guid=3365726235666077346
>         path=''/dev/dsk/c3t50002AC00039040Bd0p0''
>         devid=''id1,sd at n50002ac00039040b/q''
>         whole_disk=0
>         metaslab_array=13
>         metaslab_shift=31
>         ashift=9
>         asize=322117566464
>
> But for some reason it is unable to open the pool:
>
> $ zdb -c fpool0
> zdb: can''t open fpool0: error 2
>
> I saw several bugs related to space_map.c, but the stack traces listed
> in the bug reports were different than the one listed above.  Has
> anyone seen this bug before? Is there anyway to recover from it?
>
> Thanks for any insight,
> - Ryan
>   

-- 
Regards,

Robin Guo, Xue-Bin Guo
Solaris Kernel and Data Service QE,
Sun China Engineering and Reserch Institute
Phone: +86 10 82618200 +82296
Email: robin.guo at sun.com

Gino

2007-Sep-18 11:15 UTC

head link

[zfs-discuss] ZFS panic in space_map.c line 125

Hello,
upgrade to snv_60 or later if you care about your data :)

Gino
 
 
This message posted from opensolaris.org

Tomas Ögren

2007-Sep-18 12:57 UTC

head link

[zfs-discuss] ZFS panic in space_map.c line 125

On 18 September, 2007 - Gino sent me these 0,3K bytes:
> Hello,
> upgrade to snv_60 or later if you care about your data :)
If there are known serious data loss bug fixes that have gone into
snv60+, but not into s10u4, then I would like to tell Sun to
"backport"
those into s10u4 if they care about keeping customers..

Any specific bug fixes you know about that one really wants? (so we can
poke support)..

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Victor Latushkin

2007-Sep-18 13:12 UTC

head link

[zfs-discuss] ZFS panic in space_map.c line 125

Tomas ?gren wrote:> On 18 September, 2007 - Gino sent me these 0,3K bytes:
> 
>> Hello,
>> upgrade to snv_60 or later if you care about your data :)
> 
> If there are known serious data loss bug fixes that have gone into
> snv60+, but not into s10u4, then I would like to tell Sun to
"backport"
> those into s10u4 if they care about keeping customers..
> 
> Any specific bug fixes you know about that one really wants? (so we can
> poke support)..
I think it is bug

6458218 assertion failed: ss == NULL

which is fixed in Solaris 10 8/07.

Hth,
victor

Gino

2007-Sep-18 14:21 UTC

head link

[zfs-discuss] ZFS panic in space_map.c line 125

> Tomas ?gren wrote:
> > On 18 September, 2007 - Gino sent me these 0,3K
> bytes:
> > 
> >> Hello,
> >> upgrade to snv_60 or later if you care about your
> data :)
> > 
> > If there are known serious data loss bug fixes that
> have gone into
> > snv60+, but not into s10u4, then I would like to
> tell Sun to "backport"
> > those into s10u4 if they care about keeping
> customers..
> > 
> > Any specific bug fixes you know about that one
> really wants? (so we can
> > poke support)..
> 
> I think it is bug
> 
> 6458218 assertion failed: ss == NULL
> 
> which is fixed in Solaris 10 8/07.
yes, it was 6458218.  Then Solaris 10 8/07 will be fine.

Gino
 
 
This message posted from opensolaris.org

Reasonably Related Threads

Search for more apparently analagous threads

zfs discuss - Sep 2007 - ZFS panic in space_map.c line 125

[zfs-discuss] ZFS panic in space_map.c line 125

[zfs-discuss] ZFS panic in space_map.c line 125

[zfs-discuss] ZFS panic in space_map.c line 125

[zfs-discuss] ZFS panic in space_map.c line 125

[zfs-discuss] ZFS panic in space_map.c line 125

[zfs-discuss] ZFS panic in space_map.c line 125

Reasonably Related Threads