thr3ads.net - freebsd stable - Panic in ZFS, solaris assert: sa.sa

If this information is useful, please help other people find it:
Share via:

Andriy Gapon

2014-Apr-11 10:36 UTC

Panic in ZFS, solaris assert: sa.sa_magic == 0x2F505A

on 11/04/2014 11:02 Phil Murray said the following:> Hi there,
> 
> I?ve recently experienced two kernel panics on 8.4-RELEASE (within 2 days
of each other, and both around the same time of day oddly) with ZFS. Sorry no
dump available, but panic below.
> 
> Any ideas where to start solving this? Will upgrading to 9 (or 10) solve
it?
By chance, could the system be running zfs recv at the times when the panics
happened?
> (Sorry for any errors, hand transcribed from a screenshot)
> 
> panic:solaris asset: sa.sa_magic == 0x2F505A (0x5112fb3d == 0x2f505a),
file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c,
line: 596
> cpuid = 1
> KDB: stack backtrace:
> #0 0xffffffffff8066cb96 at kdb_backtrace+0x66
> #1 0xffffffffff8063925e at panic+0x1ce
> #2 0xffffffffff81292109 at assfail3+0x29
> #3 0xffffffffff811157fe at  zfs_space_delta_cb+0xbe
> #4 0xffffffffff8109ee12 at dmu_objset_userquota_get_ids+0x142
> #5 0xffffffffff810a5e35 at dnode_sync+0xc5
> #6 0xffffffffff8109cf3d at dmu_objset_sync_dnodes+0x5d
> #7 0xfffffffff8109d0c9 at dmu_objset_sync+0x169
> #8 0xffffffffff810b446a at dsl_pool_sync+0xca
> #9 0xffffffffff810c4c3a at spa_sync+0x34a
> #10 0xffffffffff810d6959 at txg_sync_thread+0x139
> #11 0xffffffffff8060dc7f at fork_exit+0x11f
> #12 0xffffffffff809a9c1e at fork_trampoline+0xe
> Uptime: 37d8h9m39s
> Cannot dump. Device not defined or unavailable
> Automatic reboot in 15 seconds - press a key on the console to abort
> 
> (Never actually reboots)
> 
> 
> ZFS setup:
> 
> [root at bellagio ~]# zpool status
>   pool: spool
>  state: ONLINE
>   scan: scrub repaired 0 in 13h18m with 0 errors on Wed Mar 12 19:14:08
2014
> config:
> 
> 	NAME        STATE     READ WRITE CKSUM
> 	spool       ONLINE       0     0     0
> 	  mirror-0  ONLINE       0     0     0
> 	    ada2    ONLINE       0     0     0
> 	    ada3    ONLINE       0     0     0
> 
> errors: No known data errors
> 
> [root at bellagio ~]# zfs get all spool
> NAME   PROPERTY              VALUE                  SOURCE
> spool  type                  filesystem             -
> spool  creation              Fri Feb 25 19:45 2011  -
> spool  used                  692G                   -
> spool  available             222G                   -
> spool  referenced            692G                   -
> spool  compressratio         1.00x                  -
> spool  mounted               yes                    -
> spool  quota                 none                   default
> spool  reservation           none                   default
> spool  recordsize            128K                   default
> spool  mountpoint            /var/spool/imap        local
> spool  sharenfs              off                    default
> spool  checksum              on                     default
> spool  compression           off                    local
> spool  atime                 off                    local
> spool  devices               on                     default
> spool  exec                  on                     default
> spool  setuid                on                     default
> spool  readonly              off                    default
> spool  jailed                off                    default
> spool  snapdir               hidden                 default
> spool  aclmode               discard                default
> spool  aclinherit            restricted             default
> spool  canmount              on                     default
> spool  xattr                 off                    temporary
> spool  copies                1                      default
> spool  version               5                      -
> spool  utf8only              off                    -
> spool  normalization         none                   -
> spool  casesensitivity       sensitive              -
> spool  vscan                 off                    default
> spool  nbmand                off                    default
> spool  sharesmb              off                    default
> spool  refquota              none                   default
> spool  refreservation        none                   default
> spool  primarycache          all                    default
> spool  secondarycache        all                    default
> spool  usedbysnapshots       0                      -
> spool  usedbydataset         692G                   -
> spool  usedbychildren        114M                   -
> spool  usedbyrefreservation  0                      -
> spool  logbias               latency                default
> spool  dedup                 off                    default
> spool  mlslabel                                     -
> spool  sync                  standard               default
> spool  refcompressratio      1.00x                  -
> spool  written               692G                   -
> 
> 
> 
> Cheers
> 
> Phil
> 
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"
> 

-- 
Andriy Gapon

Phil Murray

2014-Apr-11 20:19 UTC

head link

Panic in ZFS, solaris assert: sa.sa_magic == 0x2F505A

On 11/04/2014, at 10:36 pm, Andriy Gapon <avg at FreeBSD.org> wrote:
> on 11/04/2014 11:02 Phil Murray said the following:
>> Hi there,
>> 
>> I?ve recently experienced two kernel panics on 8.4-RELEASE (within 2
days of each other, and both around the same time of day oddly) with ZFS. Sorry
no dump available, but panic below.
>> 
>> Any ideas where to start solving this? Will upgrading to 9 (or 10)
solve it?
> 
> By chance, could the system be running zfs recv at the times when the
panics
> happened?
> 
No send/recv was running, no snapshots? completely boring ZFS pool (apart from
the panics)

Phil Murray

2014-Apr-15 05:39 UTC

head link

Panic in ZFS, solaris assert: sa.sa_magic == 0x2F505A

On 11/04/2014, at 10:36 pm, Andriy Gapon <avg at FreeBSD.org> wrote:
> on 11/04/2014 11:02 Phil Murray said the following:
>> Hi there,
>> 
>> I?ve recently experienced two kernel panics on 8.4-RELEASE (within 2
days of each other, and both around the same time of day oddly) with ZFS. Sorry
no dump available, but panic below.
>> 
>> Any ideas where to start solving this? Will upgrading to 9 (or 10)
solve it?
> 
> By chance, could the system be running zfs recv at the times when the
panics
> happened?
I think it might be related to this bug reported on ZFS-on-linux when upgrading
from v3 -> v5, which is exactly what I?ve done on this machine:

   https://github.com/zfsonlinux/zfs/issues/2025

In my case, the bogus sa.sa_magic value looks like this:

   panic:solaris asset: sa.sa_magic == 0x2F505A (0x5112fb3d == 0x2f505a), file: 

   $ date -r 0x5112fb3d
   Thu Feb  7 13:54:21 NZDT 2013


Cheers

Phil

Andriy Gapon

2014-Apr-15 09:15 UTC

head link

Re: Panic in ZFS, solaris assert: sa.sa_magic == 0x2F505A

on 15/04/2014 08:39 Phil Murray said the following:> 
> On 11/04/2014, at 10:36 pm, Andriy Gapon <avg@FreeBSD.org> wrote:
> 
>> on 11/04/2014 11:02 Phil Murray said the following:
>>> Hi there,
>>>
>>> I’ve recently experienced two kernel panics on 8.4-RELEASE (within
2 days of each other, and both around the same time of day oddly) with ZFS.
Sorry no dump available, but panic below.
>>>
>>> Any ideas where to start solving this? Will upgrading to 9 (or 10)
solve it?
>>
>> By chance, could the system be running zfs recv at the times when the
panics
>> happened?
> 
> I think it might be related to this bug reported on ZFS-on-linux when
upgrading from v3 -> v5, which is exactly what I’ve done on this machine:
> 
>    https://github.com/zfsonlinux/zfs/issues/2025
> 
> In my case, the bogus sa.sa_magic value looks like this:
> 
>    panic:solaris asset: sa.sa_magic == 0x2F505A (0x5112fb3d == 0x2f505a),
file:
> 
>    $ date -r 0x5112fb3d
>    Thu Feb  7 13:54:21 NZDT 2013
Great job finding that ZoL bug report!  And very good job done by people who
analyzed the problem.

Below is my guess about what could be wrong.

A thread is changing file attributes and it could end up calling
zfs_sa_upgrade() to convert file's bonus from DMU_OT_ZNODE to DMU_OT_SA. 
The
conversion is achieved in two steps:
- dmu_set_bonustype() to change the bonus type in the dnode
- sa_replace_all_by_template_locked() to re-populate the bonus data

dmu_set_bonustype() calls dnode_setbonus_type() which does the following:
        dn->dn_bonustype = newtype;
        dn->dn_next_bonustype[tx->tx_txg & TXG_MASK] =
dn->dn_bonustype;

Concurrently, the sync thread can run into the dnode if it was dirtied in an
earlier txg.  The sync thread calls dmu_objset_userquota_get_ids() via
dnode_sync().  dmu_objset_userquota_get_ids() uses dn_bonustype that has the new
value, but the data corresponding to the txg being sync-ed is still in the old
format.

As I understand, dmu_objset_userquota_get_ids() already uses
dmu_objset_userquota_find_data() when before == B_FALSE to find a proper copy of
the data corresponding to the txg being sync-ed.
So, I think that in that case dmu_objset_userquota_get_ids() should also use
values of dn_bonustype and dn_bonuslen that correspond to the txg.
If I am not mistaken, those values could be deduced from
dn_next_bonustype[tx->tx_txg & TXG_MASK] plus dn_phys->dn_bonustype
and
dn_next_bonuslen[tx->tx_txg & TXG_MASK] plus dn_phys->dn_bonuslen.

-- 
Andriy Gapon
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"

freebsd stable - Apr 2014 - Panic in ZFS, solaris assert: sa.sa_magic == 0x2F505A

Panic in ZFS, solaris assert: sa.sa_magic == 0x2F505A

Panic in ZFS, solaris assert: sa.sa_magic == 0x2F505A

Panic in ZFS, solaris assert: sa.sa_magic == 0x2F505A

Re: Panic in ZFS, solaris assert: sa.sa_magic == 0x2F505A