One of our Solaris 10 update 3 servers paniced today with the following error: Sep 18 00:34:53 m2000ef savecore: [ID 570001 auth.error] reboot after panic: assertion failed: ss != NULL, file: ../../common/fs/zfs/space_map.c, line: 125 The server saved a core file, and the resulting backtrace is listed below: $ mdb unix.0 vmcore.0> $cvpanic() 0xfffffffffb9b49f3() space_map_remove+0x239() space_map_load+0x17d() metaslab_activate+0x6f() metaslab_group_alloc+0x187() metaslab_alloc_dva+0xab() metaslab_alloc+0x51() zio_dva_allocate+0x3f() zio_next_stage+0x72() zio_checksum_generate+0x5f() zio_next_stage+0x72() zio_write_compress+0x136() zio_next_stage+0x72() zio_wait_for_children+0x49() zio_wait_children_ready+0x15() zio_next_stage_async+0xae() zio_wait+0x2d() arc_write+0xcc() dmu_objset_sync+0x141() dsl_dataset_sync+0x23() dsl_pool_sync+0x7b() spa_sync+0x116() txg_sync_thread+0x115() thread_start+8() It appears ZFS is still able to read the labels from the drive: $ zdb -lv /dev/rdsk/c3t50002AC00039040Bd0p0 -------------------------------------------- LABEL 0 -------------------------------------------- version=3 name=''fpool0'' state=0 txg=4 pool_guid=10406529929620343615 top_guid=3365726235666077346 guid=3365726235666077346 vdev_tree type=''disk'' id=0 guid=3365726235666077346 path=''/dev/dsk/c3t50002AC00039040Bd0p0'' devid=''id1,sd at n50002ac00039040b/q'' whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=322117566464 -------------------------------------------- LABEL 1 -------------------------------------------- version=3 name=''fpool0'' state=0 txg=4 pool_guid=10406529929620343615 top_guid=3365726235666077346 guid=3365726235666077346 vdev_tree type=''disk'' id=0 guid=3365726235666077346 path=''/dev/dsk/c3t50002AC00039040Bd0p0'' devid=''id1,sd at n50002ac00039040b/q'' whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=322117566464 -------------------------------------------- LABEL 2 -------------------------------------------- version=3 name=''fpool0'' state=0 txg=4 pool_guid=10406529929620343615 top_guid=3365726235666077346 guid=3365726235666077346 vdev_tree type=''disk'' id=0 guid=3365726235666077346 path=''/dev/dsk/c3t50002AC00039040Bd0p0'' devid=''id1,sd at n50002ac00039040b/q'' whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=322117566464 -------------------------------------------- LABEL 3 -------------------------------------------- version=3 name=''fpool0'' state=0 txg=4 pool_guid=10406529929620343615 top_guid=3365726235666077346 guid=3365726235666077346 vdev_tree type=''disk'' id=0 guid=3365726235666077346 path=''/dev/dsk/c3t50002AC00039040Bd0p0'' devid=''id1,sd at n50002ac00039040b/q'' whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=322117566464 But for some reason it is unable to open the pool: $ zdb -c fpool0 zdb: can''t open fpool0: error 2 I saw several bugs related to space_map.c, but the stack traces listed in the bug reports were different than the one listed above. Has anyone seen this bug before? Is there anyway to recover from it? Thanks for any insight, - Ryan -- UNIX Administrator http://prefetch.net
Hi Matty, From the stack I saw, that is 6454482. But this defect has been marked as ''Not reproducible'', I have no idea about how to recover from it, but looks like new update will not hit this issue. Matty wrote:> One of our Solaris 10 update 3 servers paniced today with the following error: > > Sep 18 00:34:53 m2000ef savecore: [ID 570001 auth.error] reboot after > panic: assertion failed: ss != NULL, file: > ../../common/fs/zfs/space_map.c, line: 125 > > The server saved a core file, and the resulting backtrace is listed below: > > $ mdb unix.0 vmcore.0 > >> $c >> > vpanic() > 0xfffffffffb9b49f3() > space_map_remove+0x239() > space_map_load+0x17d() > metaslab_activate+0x6f() > metaslab_group_alloc+0x187() > metaslab_alloc_dva+0xab() > metaslab_alloc+0x51() > zio_dva_allocate+0x3f() > zio_next_stage+0x72() > zio_checksum_generate+0x5f() > zio_next_stage+0x72() > zio_write_compress+0x136() > zio_next_stage+0x72() > zio_wait_for_children+0x49() > zio_wait_children_ready+0x15() > zio_next_stage_async+0xae() > zio_wait+0x2d() > arc_write+0xcc() > dmu_objset_sync+0x141() > dsl_dataset_sync+0x23() > dsl_pool_sync+0x7b() > spa_sync+0x116() > txg_sync_thread+0x115() > thread_start+8() > > It appears ZFS is still able to read the labels from the drive: > > $ zdb -lv /dev/rdsk/c3t50002AC00039040Bd0p0 > -------------------------------------------- > LABEL 0 > -------------------------------------------- > version=3 > name=''fpool0'' > state=0 > txg=4 > pool_guid=10406529929620343615 > top_guid=3365726235666077346 > guid=3365726235666077346 > vdev_tree > type=''disk'' > id=0 > guid=3365726235666077346 > path=''/dev/dsk/c3t50002AC00039040Bd0p0'' > devid=''id1,sd at n50002ac00039040b/q'' > whole_disk=0 > metaslab_array=13 > metaslab_shift=31 > ashift=9 > asize=322117566464 > -------------------------------------------- > LABEL 1 > -------------------------------------------- > version=3 > name=''fpool0'' > state=0 > txg=4 > pool_guid=10406529929620343615 > top_guid=3365726235666077346 > guid=3365726235666077346 > vdev_tree > type=''disk'' > id=0 > guid=3365726235666077346 > path=''/dev/dsk/c3t50002AC00039040Bd0p0'' > devid=''id1,sd at n50002ac00039040b/q'' > whole_disk=0 > metaslab_array=13 > metaslab_shift=31 > ashift=9 > asize=322117566464 > -------------------------------------------- > LABEL 2 > -------------------------------------------- > version=3 > name=''fpool0'' > state=0 > txg=4 > pool_guid=10406529929620343615 > top_guid=3365726235666077346 > guid=3365726235666077346 > vdev_tree > type=''disk'' > id=0 > guid=3365726235666077346 > path=''/dev/dsk/c3t50002AC00039040Bd0p0'' > devid=''id1,sd at n50002ac00039040b/q'' > whole_disk=0 > metaslab_array=13 > metaslab_shift=31 > ashift=9 > asize=322117566464 > -------------------------------------------- > LABEL 3 > -------------------------------------------- > version=3 > name=''fpool0'' > state=0 > txg=4 > pool_guid=10406529929620343615 > top_guid=3365726235666077346 > guid=3365726235666077346 > vdev_tree > type=''disk'' > id=0 > guid=3365726235666077346 > path=''/dev/dsk/c3t50002AC00039040Bd0p0'' > devid=''id1,sd at n50002ac00039040b/q'' > whole_disk=0 > metaslab_array=13 > metaslab_shift=31 > ashift=9 > asize=322117566464 > > But for some reason it is unable to open the pool: > > $ zdb -c fpool0 > zdb: can''t open fpool0: error 2 > > I saw several bugs related to space_map.c, but the stack traces listed > in the bug reports were different than the one listed above. Has > anyone seen this bug before? Is there anyway to recover from it? > > Thanks for any insight, > - Ryan >-- Regards, Robin Guo, Xue-Bin Guo Solaris Kernel and Data Service QE, Sun China Engineering and Reserch Institute Phone: +86 10 82618200 +82296 Email: robin.guo at sun.com
Hello, upgrade to snv_60 or later if you care about your data :) Gino This message posted from opensolaris.org
On 18 September, 2007 - Gino sent me these 0,3K bytes:> Hello, > upgrade to snv_60 or later if you care about your data :)If there are known serious data loss bug fixes that have gone into snv60+, but not into s10u4, then I would like to tell Sun to "backport" those into s10u4 if they care about keeping customers.. Any specific bug fixes you know about that one really wants? (so we can poke support).. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Tomas ?gren wrote:> On 18 September, 2007 - Gino sent me these 0,3K bytes: > >> Hello, >> upgrade to snv_60 or later if you care about your data :) > > If there are known serious data loss bug fixes that have gone into > snv60+, but not into s10u4, then I would like to tell Sun to "backport" > those into s10u4 if they care about keeping customers.. > > Any specific bug fixes you know about that one really wants? (so we can > poke support)..I think it is bug 6458218 assertion failed: ss == NULL which is fixed in Solaris 10 8/07. Hth, victor
> Tomas ?gren wrote: > > On 18 September, 2007 - Gino sent me these 0,3K > bytes: > > > >> Hello, > >> upgrade to snv_60 or later if you care about your > data :) > > > > If there are known serious data loss bug fixes that > have gone into > > snv60+, but not into s10u4, then I would like to > tell Sun to "backport" > > those into s10u4 if they care about keeping > customers.. > > > > Any specific bug fixes you know about that one > really wants? (so we can > > poke support).. > > I think it is bug > > 6458218 assertion failed: ss == NULL > > which is fixed in Solaris 10 8/07.yes, it was 6458218. Then Solaris 10 8/07 will be fine. Gino This message posted from opensolaris.org