Hello ! I have a machine that started to panic on boot (see panic message below). It think it panics when it imports the pool (5 x 2 mirror). Are there any ways to recover from that ? Some history info: that machine was upgraded a couple of days ago from snv78 to snv110. This morning zpool was upgraded to v14 and scrub was run to verify data health. After 3 or 4 hours the scrub was stopped (the IO impact was considered too high for the moment). Short time after that one person reboot it (because it felt sluggish [I hope that person will never get root access again ! ]). On reboot machine panic''ed. I had a another boot disk with fresh b110, so I booted from it, only to see it panic''ing again on zpool import. So, any ideas how to get this pool imported ? This specific organization uses Linux everywhere, but fileservers, due to ZFS. It would be pity to let them loose their trust. Here is the panic. panic[cpu2]/thread=ffffff000c697c60: assertion failed: 0 =zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx), file: ../../common/fs/zfs/dsl_dataset.c, line: 1493 ffffff000c6978d0 genunix:assfail+7e () ffffff000c697a50 zfs:dsl_dataset_destroy_sync+84b () ffffff000c697aa0 zfs:dsl_sync_task_group_sync+eb () ffffff000c697b10 zfs:dsl_pool_sync+112 () ffffff000c697ba0 zfs:spa_sync+32a () ffffff000c697c40 zfs:txg_sync_thread+265 () ffffff000c697c50 unix:thread_start+8 () -- Regards, Cyril
assertion failures are bugs. Please file one at http://bugs.opensolaris.org You may need to try another version of the OS, which may not have the bug. -- richard Cyril Plisko wrote:> Hello ! > > I have a machine that started to panic on boot (see panic message > below). It think it panics when it imports the pool (5 x 2 mirror). > Are there any ways to recover from that ? > > Some history info: that machine was upgraded a couple of days ago from > snv78 to snv110. This morning zpool was upgraded to v14 and scrub was > run to verify data health. After 3 or 4 hours the scrub was stopped > (the IO impact was considered too high for the moment). Short time > after that one person reboot it (because it felt sluggish [I hope that > person will never get root access again ! ]). On reboot machine > panic''ed. I had a another boot disk with fresh b110, so I booted from > it, only to see it panic''ing again on zpool import. > > So, any ideas how to get this pool imported ? This specific > organization uses Linux everywhere, but fileservers, due to ZFS. It > would be pity to let them loose their trust. > > Here is the panic. > > panic[cpu2]/thread=ffffff000c697c60: assertion failed: 0 => zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx), > file: ../../common/fs/zfs/dsl_dataset.c, line: 1493 > > ffffff000c6978d0 genunix:assfail+7e () > ffffff000c697a50 zfs:dsl_dataset_destroy_sync+84b () > ffffff000c697aa0 zfs:dsl_sync_task_group_sync+eb () > ffffff000c697b10 zfs:dsl_pool_sync+112 () > ffffff000c697ba0 zfs:spa_sync+32a () > ffffff000c697c40 zfs:txg_sync_thread+265 () > ffffff000c697c50 unix:thread_start+8 () > > > >
On Thu, Mar 26, 2009 at 8:45 PM, Richard Elling <richard.elling at gmail.com> wrote:> assertion failures are bugs.Yup, I know that.> ?Please file one at http://bugs.opensolaris.orgJust did.> You may need to try another version of the OS, which may not have > the bug.Well, I kinda guessed that. I hoped, may be wrongly, to hear something more concrete... Tough luck, I guess...> -- richard > > Cyril Plisko wrote: >> >> Hello ! >> >> I have a machine that started to panic on boot (see panic message >> below). It think it panics when it imports the pool (5 x 2 mirror). >> Are there any ways to recover from that ? >> >> Some history info: that machine was upgraded a couple of days ago from >> snv78 to snv110. This morning zpool was upgraded to v14 and scrub was >> run to verify data health. After 3 or 4 hours the scrub was stopped >> (the IO impact was considered too high for the moment). Short time >> after that one person reboot it (because it felt sluggish [I hope that >> person will never get root access again ! ]). On reboot machine >> panic''ed. I had a another boot disk with fresh b110, so I booted from >> it, only to see it panic''ing again on zpool import. >> >> So, any ideas how to get this pool imported ? This specific >> organization uses Linux everywhere, but fileservers, due to ZFS. It >> would be pity to let them loose their trust. >> >> Here is the panic. >> >> panic[cpu2]/thread=ffffff000c697c60: assertion failed: 0 =>> zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx), >> file: ../../common/fs/zfs/dsl_dataset.c, line: 1493 >> >> ffffff000c6978d0 genunix:assfail+7e () >> ffffff000c697a50 zfs:dsl_dataset_destroy_sync+84b () >> ffffff000c697aa0 zfs:dsl_sync_task_group_sync+eb () >> ffffff000c697b10 zfs:dsl_pool_sync+112 () >> ffffff000c697ba0 zfs:spa_sync+32a () >> ffffff000c697c40 zfs:txg_sync_thread+265 () >> ffffff000c697c50 unix:thread_start+8 () >> >> >> >> >-- Regards, Cyril
Cyril Plisko wrote:> On Thu, Mar 26, 2009 at 8:45 PM, Richard Elling > <richard.elling at gmail.com> wrote: > >> assertion failures are bugs. >> > > Yup, I know that. > > >> Please file one at http://bugs.opensolaris.org >> > > Just did. >Do you have a crash dump from this issue? - George> >> You may need to try another version of the OS, which may not have >> the bug. >> > > Well, I kinda guessed that. I hoped, may be wrongly, to hear something > more concrete... Tough luck, I guess... > > >> -- richard >> >> Cyril Plisko wrote: >> >>> Hello ! >>> >>> I have a machine that started to panic on boot (see panic message >>> below). It think it panics when it imports the pool (5 x 2 mirror). >>> Are there any ways to recover from that ? >>> >>> Some history info: that machine was upgraded a couple of days ago from >>> snv78 to snv110. This morning zpool was upgraded to v14 and scrub was >>> run to verify data health. After 3 or 4 hours the scrub was stopped >>> (the IO impact was considered too high for the moment). Short time >>> after that one person reboot it (because it felt sluggish [I hope that >>> person will never get root access again ! ]). On reboot machine >>> panic''ed. I had a another boot disk with fresh b110, so I booted from >>> it, only to see it panic''ing again on zpool import. >>> >>> So, any ideas how to get this pool imported ? This specific >>> organization uses Linux everywhere, but fileservers, due to ZFS. It >>> would be pity to let them loose their trust. >>> >>> Here is the panic. >>> >>> panic[cpu2]/thread=ffffff000c697c60: assertion failed: 0 =>>> zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx), >>> file: ../../common/fs/zfs/dsl_dataset.c, line: 1493 >>> >>> ffffff000c6978d0 genunix:assfail+7e () >>> ffffff000c697a50 zfs:dsl_dataset_destroy_sync+84b () >>> ffffff000c697aa0 zfs:dsl_sync_task_group_sync+eb () >>> ffffff000c697b10 zfs:dsl_pool_sync+112 () >>> ffffff000c697ba0 zfs:spa_sync+32a () >>> ffffff000c697c40 zfs:txg_sync_thread+265 () >>> ffffff000c697c50 unix:thread_start+8 () >>> >>> >>> >>> >>> > > > >
On Tue, Mar 31, 2009 at 11:01 PM, George Wilson <George.Wilson at sun.com> wrote:> Cyril Plisko wrote: >> >> On Thu, Mar 26, 2009 at 8:45 PM, Richard Elling >> <richard.elling at gmail.com> wrote: >> >>> >>> assertion failures are bugs. >>> >> >> Yup, I know that. >> >> >>> >>> ?Please file one at http://bugs.opensolaris.org >>> >> >> Just did. >> > > Do you have a crash dump from this issue?George, Getting crash dump turned out to be somewhat problematic. Apparently it panics before the dump volume is being activated (or so it seems). Moreover the machine owners decided to put other disks inside and to get it working (they were planning to put bigger disks anyhow). I have the original disks with pool that causes that kept aside, but I am looking for a machine to put these disks in. Since there are 10 disks it is not trivial to find a suitable machine. I think, however, that I may try having only 5 disks (one half of the each mirror). Do you think it is ok to try it that way ? -- Regards, Cyril
Cyril Plisko wrote:> On Tue, Mar 31, 2009 at 11:01 PM, George Wilson <George.Wilson at sun.com> wrote: >> Cyril Plisko wrote: >>> On Thu, Mar 26, 2009 at 8:45 PM, Richard Elling >>> <richard.elling at gmail.com> wrote: >>> >>>> assertion failures are bugs. >>>> >>> Yup, I know that. >>> >>> >>>> Please file one at http://bugs.opensolaris.org >>>> >>> Just did. >>> >> Do you have a crash dump from this issue? > > George, > > Getting crash dump turned out to be somewhat problematic. Apparently it > panics before the dump volume is being activated (or so it seems). > Moreover the machine owners decided to put other disks inside and to > get it working (they were planning to put bigger disks anyhow). I have > the original disks with pool that causes that kept aside, but I am > looking for a machine to put these disks in. Since there are 10 disks > it is not trivial to find a suitable machine. I think, however, that I > may try having only 5 disks (one half of the each mirror). Do you > think it is ok to try it that way ?Given that the panic was pretty reproducible I would think that having half the mirrors would be sufficient. BTW, did you ever try to import the root pool by booting failsafe or off the DVD/CD? Thanks, George>