Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a snapshot a few days ago: # zfs snapshot a at b # zfs clone a at b tank/a # zfs clone a at b tank/b The system started panicing after I tried: # zfs snapshot tank/b at backup So, I destroyed tank/b: # zfs destroy tank/b then tried to destroy tank/a # zfs destroy tank/a Now, the system is in an endless panic loop, unable to import the pool at system startup or with "zpool import". The panic dump is: panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 == zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 == 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512 ffffff00102468d0 genunix:assfail3+c1 () ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a () ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb () ffffff0010246b10 zfs:dsl_pool_sync+196 () ffffff0010246ba0 zfs:spa_sync+32a () ffffff0010246c40 zfs:txg_sync_thread+265 () ffffff0010246c50 unix:thread_start+8 () We really need to import this pool. Is there a way around this? We do have snv_114 source on the system if we need to make changes to usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs destroy" transaction never completed and it is being replayed, causing the panic. This cycle continues endlessly. -- albert chin (china at thewrittenword.com)
On Fri, Sep 25, 2009 at 05:21:23AM +0000, Albert Chin wrote:> [[ snip snip ]] > > We really need to import this pool. Is there a way around this? We do > have snv_114 source on the system if we need to make changes to > usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs > destroy" transaction never completed and it is being replayed, causing > the panic. This cycle continues endlessly.What are the implications of adding the following to /etc/system: set zfs:zfs_recover=1 set aok=1 And importing the pool with: # zpool import -o ro -- albert chin (china at thewrittenword.com)
Assertion failures indicate bugs. You might try another version of the OS. In general, they are easy to search for in the bugs database. A quick search reveals http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6822816 but that doesn''t look like it will help you. I suggest filing a new bug at the very least. http://en.wikipedia.org/wiki/Assertion_(computing) -- richard On Sep 24, 2009, at 10:21 PM, Albert Chin wrote:> Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a > snapshot a few days ago: > # zfs snapshot a at b > # zfs clone a at b tank/a > # zfs clone a at b tank/b > > The system started panicing after I tried: > # zfs snapshot tank/b at backup > > So, I destroyed tank/b: > # zfs destroy tank/b > then tried to destroy tank/a > # zfs destroy tank/a > > Now, the system is in an endless panic loop, unable to import the pool > at system startup or with "zpool import". The panic dump is: > panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 == > zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) > (0x0 == 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512 > > ffffff00102468d0 genunix:assfail3+c1 () > ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a () > ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb () > ffffff0010246b10 zfs:dsl_pool_sync+196 () > ffffff0010246ba0 zfs:spa_sync+32a () > ffffff0010246c40 zfs:txg_sync_thread+265 () > ffffff0010246c50 unix:thread_start+8 () > > We really need to import this pool. Is there a way around this? We do > have snv_114 source on the system if we need to make changes to > usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs > destroy" transaction never completed and it is being replayed, causing > the panic. This cycle continues endlessly. > > -- > albert chin (china at thewrittenword.com) > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Victor Latushkin
2009-Sep-26 11:37 UTC
[zfs-discuss] Help! System panic when pool imported
Richard Elling wrote:> Assertion failures indicate bugs. You might try another version of the OS. > In general, they are easy to search for in the bugs database. A quick > search reveals > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6822816 > but that doesn''t look like it will help you. I suggest filing a new bug at > the very least.I have redispatched 6822816, so it needs to be reevaluated since more information is available now. victor> On Sep 24, 2009, at 10:21 PM, Albert Chin wrote: > >> Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a >> snapshot a few days ago: >> # zfs snapshot a at b >> # zfs clone a at b tank/a >> # zfs clone a at b tank/b >> >> The system started panicing after I tried: >> # zfs snapshot tank/b at backup >> >> So, I destroyed tank/b: >> # zfs destroy tank/b >> then tried to destroy tank/a >> # zfs destroy tank/a >> >> Now, the system is in an endless panic loop, unable to import the pool >> at system startup or with "zpool import". The panic dump is: >> panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 == >> zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) >> (0x0 == 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512 >> >> ffffff00102468d0 genunix:assfail3+c1 () >> ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a () >> ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb () >> ffffff0010246b10 zfs:dsl_pool_sync+196 () >> ffffff0010246ba0 zfs:spa_sync+32a () >> ffffff0010246c40 zfs:txg_sync_thread+265 () >> ffffff0010246c50 unix:thread_start+8 () >> >> We really need to import this pool. Is there a way around this? We do >> have snv_114 source on the system if we need to make changes to >> usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs >> destroy" transaction never completed and it is being replayed, causing >> the panic. This cycle continues endlessly. >> >> -- >> albert chin (china at thewrittenword.com) >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
I''m getting the same thing now. I tried moving my 5-disk raidZ and 2disk Mirror over to another machine, but that machine would keep panic''ing (not ZFS related panics). When I brought the array back over, I started getting this as well.. My Mirror array is unaffected. snv111b (2009.06 release) -- This message posted from opensolaris.org
On Sun, Sep 27, 2009 at 12:25:28AM -0700, Andrew wrote:> I''m getting the same thing now. > > I tried moving my 5-disk raidZ and 2disk Mirror over to another > machine, but that machine would keep panic''ing (not ZFS related > panics). When I brought the array back over, I started getting this as > well.. My Mirror array is unaffected. > > snv111b (2009.06 release)What does the panic dump look like? -- albert chin (china at thewrittenword.com)
This is what my /var/adm/messages looks like: Sep 27 12:46:29 solaria genunix: [ID 403854 kern.notice] assertion failed: ss == NULL, file: ../../common/fs/zfs/space_map.c, line: 109 Sep 27 12:46:29 solaria unix: [ID 100000 kern.notice] Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a97a0 genunix:assfail+7e () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9830 zfs:space_map_add+292 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a98e0 zfs:space_map_load+3a7 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9920 zfs:metaslab_activate+64 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a99e0 zfs:metaslab_group_alloc+2b7 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9ac0 zfs:metaslab_alloc_dva+295 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9b60 zfs:metaslab_alloc+9b () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9b90 zfs:zio_dva_allocate+3e () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9bc0 zfs:zio_execute+a0 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9c40 genunix:taskq_thread+193 () Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9c50 unix:thread_start+8 () -- This message posted from opensolaris.org
On Sun, Sep 27, 2009 at 10:06:16AM -0700, Andrew wrote:> This is what my /var/adm/messages looks like: > > Sep 27 12:46:29 solaria genunix: [ID 403854 kern.notice] assertion failed: ss == NULL, file: ../../common/fs/zfs/space_map.c, line: 109 > Sep 27 12:46:29 solaria unix: [ID 100000 kern.notice] > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a97a0 genunix:assfail+7e () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9830 zfs:space_map_add+292 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a98e0 zfs:space_map_load+3a7 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9920 zfs:metaslab_activate+64 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a99e0 zfs:metaslab_group_alloc+2b7 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9ac0 zfs:metaslab_alloc_dva+295 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9b60 zfs:metaslab_alloc+9b () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9b90 zfs:zio_dva_allocate+3e () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9bc0 zfs:zio_execute+a0 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9c40 genunix:taskq_thread+193 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9c50 unix:thread_start+8 ()I''m not sure that aok=1/zfs:zfs_recover=1 would help you because zfs_panic_recover isn''t in the backtrace (see http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6638754). Sometimes a Sun zfs engineer shows up on the freenode #zfs channel. I''d pop up there and ask. There are somewhat similar bug reports at bugs.opensolaris.org. I''d post a bug report just in case. -- albert chin (china at thewrittenword.com)
Thanks for reporting this. I have fixed this bug (6822816) in build 127. Here is the evaluation from the bug report: The problem is that the clone''s dsobj does not appear in the origin''s ds_next_clones_obj. The bug can occur can occur under certain circumstances if there was a "botched upgrade" when doing "zpool upgrade" from pool version 10 or earlier to version 11 or later, while there was a clone in the pool. The problem is caused because upgrade_clones_cb() failed to call dmu_buf_will_dirty(origin->ds_dbuf). This bug can have several effects: 1. assertion failure from dsl_dataset_destroy_sync() 2. assertion failure from dsl_dataset_snapshot_sync() 3. assertion failure from dsl_dataset_promote_sync() 4. incomplete scrub or resilver, potentially leading to data loss The fix will address the root cause, and also work around all of these issues on pools that have already experienced the botched upgrade, whether or not they have encountered any of the above effects. Anyone who may have a botched upgrade should run "zpool scrub" after upgrading to bits with the fix in place (build 127 or later). --matt Albert Chin wrote:> Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a > snapshot a few days ago: > # zfs snapshot a at b > # zfs clone a at b tank/a > # zfs clone a at b tank/b > > The system started panicing after I tried: > # zfs snapshot tank/b at backup > > So, I destroyed tank/b: > # zfs destroy tank/b > then tried to destroy tank/a > # zfs destroy tank/a > > Now, the system is in an endless panic loop, unable to import the pool > at system startup or with "zpool import". The panic dump is: > panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 == zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 == 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512 > > ffffff00102468d0 genunix:assfail3+c1 () > ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a () > ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb () > ffffff0010246b10 zfs:dsl_pool_sync+196 () > ffffff0010246ba0 zfs:spa_sync+32a () > ffffff0010246c40 zfs:txg_sync_thread+265 () > ffffff0010246c50 unix:thread_start+8 () > > We really need to import this pool. Is there a way around this? We do > have snv_114 source on the system if we need to make changes to > usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs > destroy" transaction never completed and it is being replayed, causing > the panic. This cycle continues endlessly. > >
On Mon, Oct 19, 2009 at 03:31:46PM -0700, Matthew Ahrens wrote:> Thanks for reporting this. I have fixed this bug (6822816) in build > 127.Thanks. I just installed OpenSolaris Preview based on 125 and will attempt to apply the patch you made to this release and import the pool.> --matt > > Albert Chin wrote: >> Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a >> snapshot a few days ago: >> # zfs snapshot a at b >> # zfs clone a at b tank/a >> # zfs clone a at b tank/b >> >> The system started panicing after I tried: >> # zfs snapshot tank/b at backup >> >> So, I destroyed tank/b: >> # zfs destroy tank/b >> then tried to destroy tank/a >> # zfs destroy tank/a >> >> Now, the system is in an endless panic loop, unable to import the pool >> at system startup or with "zpool import". The panic dump is: >> panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 == zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 == 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512 >> >> ffffff00102468d0 genunix:assfail3+c1 () >> ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a () >> ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb () >> ffffff0010246b10 zfs:dsl_pool_sync+196 () >> ffffff0010246ba0 zfs:spa_sync+32a () >> ffffff0010246c40 zfs:txg_sync_thread+265 () >> ffffff0010246c50 unix:thread_start+8 () >> >> We really need to import this pool. Is there a way around this? We do >> have snv_114 source on the system if we need to make changes to >> usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs >> destroy" transaction never completed and it is being replayed, causing >> the panic. This cycle continues endlessly. >> >> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- albert chin (china at thewrittenword.com)
On Mon, Oct 19, 2009 at 09:02:20PM -0500, Albert Chin wrote:> On Mon, Oct 19, 2009 at 03:31:46PM -0700, Matthew Ahrens wrote: > > Thanks for reporting this. I have fixed this bug (6822816) in build > > 127. > > Thanks. I just installed OpenSolaris Preview based on 125 and will > attempt to apply the patch you made to this release and import the pool.Did the above and the zpool import worked. Thanks!> > --matt > > > > Albert Chin wrote: > >> Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a > >> snapshot a few days ago: > >> # zfs snapshot a at b > >> # zfs clone a at b tank/a > >> # zfs clone a at b tank/b > >> > >> The system started panicing after I tried: > >> # zfs snapshot tank/b at backup > >> > >> So, I destroyed tank/b: > >> # zfs destroy tank/b > >> then tried to destroy tank/a > >> # zfs destroy tank/a > >> > >> Now, the system is in an endless panic loop, unable to import the pool > >> at system startup or with "zpool import". The panic dump is: > >> panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 == zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 == 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512 > >> > >> ffffff00102468d0 genunix:assfail3+c1 () > >> ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a () > >> ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb () > >> ffffff0010246b10 zfs:dsl_pool_sync+196 () > >> ffffff0010246ba0 zfs:spa_sync+32a () > >> ffffff0010246c40 zfs:txg_sync_thread+265 () > >> ffffff0010246c50 unix:thread_start+8 () > >> > >> We really need to import this pool. Is there a way around this? We do > >> have snv_114 source on the system if we need to make changes to > >> usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs > >> destroy" transaction never completed and it is being replayed, causing > >> the panic. This cycle continues endlessly. > >> > >> > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > -- > albert chin (china at thewrittenword.com) > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- albert chin (china at thewrittenword.com)