thr3ads.net - zfs discuss - [zfs-discuss] Help! System panic when pool imported [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Albert Chin

2009-Sep-25 05:21 UTC

[zfs-discuss] Help! System panic when pool imported

Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a
snapshot a few days ago:
  # zfs snapshot a at b
  # zfs clone a at b tank/a
  # zfs clone a at b tank/b

The system started panicing after I tried:
  # zfs snapshot tank/b at backup

So, I destroyed tank/b:
  # zfs destroy tank/b
then tried to destroy tank/a
  # zfs destroy tank/a

Now, the system is in an endless panic loop, unable to import the pool
at system startup or with "zpool import". The panic dump is:
  panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 ==
zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 ==
0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512

  ffffff00102468d0 genunix:assfail3+c1 ()
  ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
  ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
  ffffff0010246b10 zfs:dsl_pool_sync+196 ()
  ffffff0010246ba0 zfs:spa_sync+32a ()
  ffffff0010246c40 zfs:txg_sync_thread+265 ()
  ffffff0010246c50 unix:thread_start+8 ()

We really need to import this pool. Is there a way around this? We do
have snv_114 source on the system if we need to make changes to
usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs
destroy" transaction never completed and it is being replayed, causing
the panic. This cycle continues endlessly.

-- 
albert chin (china at thewrittenword.com)

Albert Chin

2009-Sep-25 06:05 UTC

head link

[zfs-discuss] Help! System panic when pool imported

On Fri, Sep 25, 2009 at 05:21:23AM +0000, Albert Chin
wrote:> [[ snip snip ]]
> 
> We really need to import this pool. Is there a way around this? We do
> have snv_114 source on the system if we need to make changes to
> usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs
> destroy" transaction never completed and it is being replayed, causing
> the panic. This cycle continues endlessly.
What are the implications of adding the following to /etc/system:
  set zfs:zfs_recover=1
  set aok=1

And importing the pool with:
  # zpool import -o ro

-- 
albert chin (china at thewrittenword.com)

Richard Elling

2009-Sep-25 16:52 UTC

head link

[zfs-discuss] Help! System panic when pool imported

Assertion failures indicate bugs. You might try another version of the  
OS.
In general, they are easy to search for in the bugs database.  A quick
search reveals
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6822816
but that doesn''t look like it will help you.  I suggest filing a new  
bug at
the very least.

http://en.wikipedia.org/wiki/Assertion_(computing)
  -- richard


On Sep 24, 2009, at 10:21 PM, Albert Chin wrote:
> Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a
> snapshot a few days ago:
>  # zfs snapshot a at b
>  # zfs clone a at b tank/a
>  # zfs clone a at b tank/b
>
> The system started panicing after I tried:
>  # zfs snapshot tank/b at backup
>
> So, I destroyed tank/b:
>  # zfs destroy tank/b
> then tried to destroy tank/a
>  # zfs destroy tank/a
>
> Now, the system is in an endless panic loop, unable to import the pool
> at system startup or with "zpool import". The panic dump is:
>  panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 ==  
> zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx)  
> (0x0 == 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512
>
>  ffffff00102468d0 genunix:assfail3+c1 ()
>  ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
>  ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
>  ffffff0010246b10 zfs:dsl_pool_sync+196 ()
>  ffffff0010246ba0 zfs:spa_sync+32a ()
>  ffffff0010246c40 zfs:txg_sync_thread+265 ()
>  ffffff0010246c50 unix:thread_start+8 ()
>
> We really need to import this pool. Is there a way around this? We do
> have snv_114 source on the system if we need to make changes to
> usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs
> destroy" transaction never completed and it is being replayed, causing
> the panic. This cycle continues endlessly.
>
> -- 
> albert chin (china at thewrittenword.com)
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Victor Latushkin

2009-Sep-26 11:37 UTC

head link

[zfs-discuss] Help! System panic when pool imported

Richard Elling wrote:> Assertion failures indicate bugs. You might try another version of the OS.
> In general, they are easy to search for in the bugs database.  A quick
> search reveals
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6822816
> but that doesn''t look like it will help you.  I suggest filing a
new bug at
> the very least.
I have redispatched 6822816, so it needs to be reevaluated since more 
information is available now.

victor
> On Sep 24, 2009, at 10:21 PM, Albert Chin wrote:
> 
>> Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a
>> snapshot a few days ago:
>>  # zfs snapshot a at b
>>  # zfs clone a at b tank/a
>>  # zfs clone a at b tank/b
>>
>> The system started panicing after I tried:
>>  # zfs snapshot tank/b at backup
>>
>> So, I destroyed tank/b:
>>  # zfs destroy tank/b
>> then tried to destroy tank/a
>>  # zfs destroy tank/a
>>
>> Now, the system is in an endless panic loop, unable to import the pool
>> at system startup or with "zpool import". The panic dump is:
>>  panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 == 
>> zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj,
tx)
>> (0x0 == 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512
>>
>>  ffffff00102468d0 genunix:assfail3+c1 ()
>>  ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
>>  ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
>>  ffffff0010246b10 zfs:dsl_pool_sync+196 ()
>>  ffffff0010246ba0 zfs:spa_sync+32a ()
>>  ffffff0010246c40 zfs:txg_sync_thread+265 ()
>>  ffffff0010246c50 unix:thread_start+8 ()
>>
>> We really need to import this pool. Is there a way around this? We do
>> have snv_114 source on the system if we need to make changes to
>> usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs
>> destroy" transaction never completed and it is being replayed,
causing
>> the panic. This cycle continues endlessly.
>>
>> -- 
>> albert chin (china at thewrittenword.com)
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Andrew

2009-Sep-27 07:25 UTC

head link

[zfs-discuss] Help! System panic when pool imported

I''m getting the same thing now.

I tried moving my 5-disk raidZ and 2disk Mirror over to another machine, but
that machine would keep panic''ing (not ZFS related panics). When I
brought the array back over, I started getting this as well.. My Mirror array is
unaffected.

snv111b (2009.06 release)
-- 
This message posted from opensolaris.org

Albert Chin

2009-Sep-27 07:29 UTC

head link

[zfs-discuss] Help! System panic when pool imported

On Sun, Sep 27, 2009 at 12:25:28AM -0700, Andrew wrote:> I''m getting the same thing now.
> 
> I tried moving my 5-disk raidZ and 2disk Mirror over to another
> machine, but that machine would keep panic''ing (not ZFS related
> panics). When I brought the array back over, I started getting this as
> well.. My Mirror array is unaffected.
> 
> snv111b (2009.06 release)
What does the panic dump look like?

-- 
albert chin (china at thewrittenword.com)

Andrew

2009-Sep-27 17:06 UTC

head link

[zfs-discuss] Help! System panic when pool imported

This is what my /var/adm/messages looks like:

Sep 27 12:46:29 solaria genunix: [ID 403854 kern.notice] assertion failed: ss ==
NULL, file: ../../common/fs/zfs/space_map.c, line: 109
Sep 27 12:46:29 solaria unix: [ID 100000 kern.notice]
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a97a0
genunix:assfail+7e ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9830
zfs:space_map_add+292 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a98e0
zfs:space_map_load+3a7 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9920
zfs:metaslab_activate+64 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a99e0
zfs:metaslab_group_alloc+2b7 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9ac0
zfs:metaslab_alloc_dva+295 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9b60
zfs:metaslab_alloc+9b ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9b90
zfs:zio_dva_allocate+3e ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9bc0
zfs:zio_execute+a0 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9c40
genunix:taskq_thread+193 ()
Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9c50
unix:thread_start+8 ()
-- 
This message posted from opensolaris.org

Albert Chin

2009-Sep-28 02:39 UTC

head link

[zfs-discuss] Help! System panic when pool imported

On Sun, Sep 27, 2009 at 10:06:16AM -0700, Andrew wrote:> This is what my /var/adm/messages looks like:
> 
> Sep 27 12:46:29 solaria genunix: [ID 403854 kern.notice] assertion failed:
ss == NULL, file: ../../common/fs/zfs/space_map.c, line: 109
> Sep 27 12:46:29 solaria unix: [ID 100000 kern.notice]
> Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a97a0
genunix:assfail+7e ()
> Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9830
zfs:space_map_add+292 ()
> Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a98e0
zfs:space_map_load+3a7 ()
> Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9920
zfs:metaslab_activate+64 ()
> Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a99e0
zfs:metaslab_group_alloc+2b7 ()
> Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9ac0
zfs:metaslab_alloc_dva+295 ()
> Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9b60
zfs:metaslab_alloc+9b ()
> Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9b90
zfs:zio_dva_allocate+3e ()
> Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9bc0
zfs:zio_execute+a0 ()
> Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9c40
genunix:taskq_thread+193 ()
> Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ffffff00089a9c50
unix:thread_start+8 ()
I''m not sure that aok=1/zfs:zfs_recover=1 would help you because
zfs_panic_recover isn''t in the backtrace (see
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6638754).
Sometimes a Sun zfs engineer shows up on the freenode #zfs channel. I''d
pop up there and ask. There are somewhat similar bug reports at
bugs.opensolaris.org. I''d post a bug report just in case.

-- 
albert chin (china at thewrittenword.com)

Matthew Ahrens

2009-Oct-19 22:31 UTC

head link

[zfs-discuss] Help! System panic when pool imported

Thanks for reporting this.  I have fixed this bug (6822816) in build 
127.  Here is the evaluation from the bug report:

The problem is that the clone''s dsobj does not appear in the
origin''s
ds_next_clones_obj. 

The bug can occur can occur under certain circumstances if there was a 
"botched upgrade" when doing "zpool upgrade" from pool
version 10 or
earlier to version 11 or later, while there was a clone in the pool.

The problem is caused because upgrade_clones_cb() failed to call 
dmu_buf_will_dirty(origin->ds_dbuf).

This bug can have several effects:

1. assertion failure from dsl_dataset_destroy_sync()
2. assertion failure from dsl_dataset_snapshot_sync()
3. assertion failure from dsl_dataset_promote_sync()
4. incomplete scrub or resilver, potentially leading to data loss

The fix will address the root cause, and also work around all of these 
issues on pools that have already experienced the botched upgrade, 
whether or not they have encountered any of the above effects.

Anyone who may have a botched upgrade should run "zpool scrub" after 
upgrading to bits with the fix in place (build 127 or later).

--matt

Albert Chin wrote:> Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a
> snapshot a few days ago:
>   # zfs snapshot a at b
>   # zfs clone a at b tank/a
>   # zfs clone a at b tank/b
>
> The system started panicing after I tried:
>   # zfs snapshot tank/b at backup
>
> So, I destroyed tank/b:
>   # zfs destroy tank/b
> then tried to destroy tank/a
>   # zfs destroy tank/a
>
> Now, the system is in an endless panic loop, unable to import the pool
> at system startup or with "zpool import". The panic dump is:
>   panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 ==
zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 ==
0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512
>
>   ffffff00102468d0 genunix:assfail3+c1 ()
>   ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
>   ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
>   ffffff0010246b10 zfs:dsl_pool_sync+196 ()
>   ffffff0010246ba0 zfs:spa_sync+32a ()
>   ffffff0010246c40 zfs:txg_sync_thread+265 ()
>   ffffff0010246c50 unix:thread_start+8 ()
>
> We really need to import this pool. Is there a way around this? We do
> have snv_114 source on the system if we need to make changes to
> usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs
> destroy" transaction never completed and it is being replayed, causing
> the panic. This cycle continues endlessly.
>
>

Albert Chin

2009-Oct-20 02:02 UTC

head link

[zfs-discuss] Help! System panic when pool imported

On Mon, Oct 19, 2009 at 03:31:46PM -0700, Matthew Ahrens
wrote:> Thanks for reporting this.  I have fixed this bug (6822816) in build  
> 127.
Thanks. I just installed OpenSolaris Preview based on 125 and will
attempt to apply the patch you made to this release and import the pool.
> --matt
>
> Albert Chin wrote:
>> Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a
>> snapshot a few days ago:
>>   # zfs snapshot a at b
>>   # zfs clone a at b tank/a
>>   # zfs clone a at b tank/b
>>
>> The system started panicing after I tried:
>>   # zfs snapshot tank/b at backup
>>
>> So, I destroyed tank/b:
>>   # zfs destroy tank/b
>> then tried to destroy tank/a
>>   # zfs destroy tank/a
>>
>> Now, the system is in an endless panic loop, unable to import the pool
>> at system startup or with "zpool import". The panic dump is:
>>   panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 ==
zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 ==
0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512
>>
>>   ffffff00102468d0 genunix:assfail3+c1 ()
>>   ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
>>   ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
>>   ffffff0010246b10 zfs:dsl_pool_sync+196 ()
>>   ffffff0010246ba0 zfs:spa_sync+32a ()
>>   ffffff0010246c40 zfs:txg_sync_thread+265 ()
>>   ffffff0010246c50 unix:thread_start+8 ()
>>
>> We really need to import this pool. Is there a way around this? We do
>> have snv_114 source on the system if we need to make changes to
>> usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs
>> destroy" transaction never completed and it is being replayed,
causing
>> the panic. This cycle continues endlessly.
>>
>>   
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
-- 
albert chin (china at thewrittenword.com)

Albert Chin

2009-Oct-20 16:23 UTC

head link

[zfs-discuss] Help! System panic when pool imported

On Mon, Oct 19, 2009 at 09:02:20PM -0500, Albert Chin
wrote:> On Mon, Oct 19, 2009 at 03:31:46PM -0700, Matthew Ahrens wrote:
> > Thanks for reporting this.  I have fixed this bug (6822816) in build  
> > 127.
> 
> Thanks. I just installed OpenSolaris Preview based on 125 and will
> attempt to apply the patch you made to this release and import the pool.
Did the above and the zpool import worked. Thanks!
> > --matt
> >
> > Albert Chin wrote:
> >> Running snv_114 on an X4100M2 connected to a 6140. Made a clone of
a
> >> snapshot a few days ago:
> >>   # zfs snapshot a at b
> >>   # zfs clone a at b tank/a
> >>   # zfs clone a at b tank/b
> >>
> >> The system started panicing after I tried:
> >>   # zfs snapshot tank/b at backup
> >>
> >> So, I destroyed tank/b:
> >>   # zfs destroy tank/b
> >> then tried to destroy tank/a
> >>   # zfs destroy tank/a
> >>
> >> Now, the system is in an endless panic loop, unable to import the
pool
> >> at system startup or with "zpool import". The panic dump
is:
> >>   panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 ==
zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 ==
0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512
> >>
> >>   ffffff00102468d0 genunix:assfail3+c1 ()
> >>   ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
> >>   ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
> >>   ffffff0010246b10 zfs:dsl_pool_sync+196 ()
> >>   ffffff0010246ba0 zfs:spa_sync+32a ()
> >>   ffffff0010246c40 zfs:txg_sync_thread+265 ()
> >>   ffffff0010246c50 unix:thread_start+8 ()
> >>
> >> We really need to import this pool. Is there a way around this? We
do
> >> have snv_114 source on the system if we need to make changes to
> >> usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the
"zfs
> >> destroy" transaction never completed and it is being
replayed, causing
> >> the panic. This cycle continues endlessly.
> >>
> >>   
> >
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> 
> -- 
> albert chin (china at thewrittenword.com)
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
-- 
albert chin (china at thewrittenword.com)

zfs discuss - Sep 2009 - Help! System panic when pool imported

[zfs-discuss] Help! System panic when pool imported

[zfs-discuss] Help! System panic when pool imported

[zfs-discuss] Help! System panic when pool imported

[zfs-discuss] Help! System panic when pool imported

[zfs-discuss] Help! System panic when pool imported

[zfs-discuss] Help! System panic when pool imported

[zfs-discuss] Help! System panic when pool imported

[zfs-discuss] Help! System panic when pool imported

[zfs-discuss] Help! System panic when pool imported

[zfs-discuss] Help! System panic when pool imported

[zfs-discuss] Help! System panic when pool imported