on 11/04/2014 11:02 Phil Murray said the following:> Hi there, > > I?ve recently experienced two kernel panics on 8.4-RELEASE (within 2 days of each other, and both around the same time of day oddly) with ZFS. Sorry no dump available, but panic below. > > Any ideas where to start solving this? Will upgrading to 9 (or 10) solve it?By chance, could the system be running zfs recv at the times when the panics happened?> (Sorry for any errors, hand transcribed from a screenshot) > > panic:solaris asset: sa.sa_magic == 0x2F505A (0x5112fb3d == 0x2f505a), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c, line: 596 > cpuid = 1 > KDB: stack backtrace: > #0 0xffffffffff8066cb96 at kdb_backtrace+0x66 > #1 0xffffffffff8063925e at panic+0x1ce > #2 0xffffffffff81292109 at assfail3+0x29 > #3 0xffffffffff811157fe at zfs_space_delta_cb+0xbe > #4 0xffffffffff8109ee12 at dmu_objset_userquota_get_ids+0x142 > #5 0xffffffffff810a5e35 at dnode_sync+0xc5 > #6 0xffffffffff8109cf3d at dmu_objset_sync_dnodes+0x5d > #7 0xfffffffff8109d0c9 at dmu_objset_sync+0x169 > #8 0xffffffffff810b446a at dsl_pool_sync+0xca > #9 0xffffffffff810c4c3a at spa_sync+0x34a > #10 0xffffffffff810d6959 at txg_sync_thread+0x139 > #11 0xffffffffff8060dc7f at fork_exit+0x11f > #12 0xffffffffff809a9c1e at fork_trampoline+0xe > Uptime: 37d8h9m39s > Cannot dump. Device not defined or unavailable > Automatic reboot in 15 seconds - press a key on the console to abort > > (Never actually reboots) > > > ZFS setup: > > [root at bellagio ~]# zpool status > pool: spool > state: ONLINE > scan: scrub repaired 0 in 13h18m with 0 errors on Wed Mar 12 19:14:08 2014 > config: > > NAME STATE READ WRITE CKSUM > spool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > > errors: No known data errors > > [root at bellagio ~]# zfs get all spool > NAME PROPERTY VALUE SOURCE > spool type filesystem - > spool creation Fri Feb 25 19:45 2011 - > spool used 692G - > spool available 222G - > spool referenced 692G - > spool compressratio 1.00x - > spool mounted yes - > spool quota none default > spool reservation none default > spool recordsize 128K default > spool mountpoint /var/spool/imap local > spool sharenfs off default > spool checksum on default > spool compression off local > spool atime off local > spool devices on default > spool exec on default > spool setuid on default > spool readonly off default > spool jailed off default > spool snapdir hidden default > spool aclmode discard default > spool aclinherit restricted default > spool canmount on default > spool xattr off temporary > spool copies 1 default > spool version 5 - > spool utf8only off - > spool normalization none - > spool casesensitivity sensitive - > spool vscan off default > spool nbmand off default > spool sharesmb off default > spool refquota none default > spool refreservation none default > spool primarycache all default > spool secondarycache all default > spool usedbysnapshots 0 - > spool usedbydataset 692G - > spool usedbychildren 114M - > spool usedbyrefreservation 0 - > spool logbias latency default > spool dedup off default > spool mlslabel - > spool sync standard default > spool refcompressratio 1.00x - > spool written 692G - > > > > Cheers > > Phil > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >-- Andriy Gapon
On 11/04/2014, at 10:36 pm, Andriy Gapon <avg at FreeBSD.org> wrote:> on 11/04/2014 11:02 Phil Murray said the following: >> Hi there, >> >> I?ve recently experienced two kernel panics on 8.4-RELEASE (within 2 days of each other, and both around the same time of day oddly) with ZFS. Sorry no dump available, but panic below. >> >> Any ideas where to start solving this? Will upgrading to 9 (or 10) solve it? > > By chance, could the system be running zfs recv at the times when the panics > happened? >No send/recv was running, no snapshots? completely boring ZFS pool (apart from the panics)
On 11/04/2014, at 10:36 pm, Andriy Gapon <avg at FreeBSD.org> wrote:> on 11/04/2014 11:02 Phil Murray said the following: >> Hi there, >> >> I?ve recently experienced two kernel panics on 8.4-RELEASE (within 2 days of each other, and both around the same time of day oddly) with ZFS. Sorry no dump available, but panic below. >> >> Any ideas where to start solving this? Will upgrading to 9 (or 10) solve it? > > By chance, could the system be running zfs recv at the times when the panics > happened?I think it might be related to this bug reported on ZFS-on-linux when upgrading from v3 -> v5, which is exactly what I?ve done on this machine: https://github.com/zfsonlinux/zfs/issues/2025 In my case, the bogus sa.sa_magic value looks like this: panic:solaris asset: sa.sa_magic == 0x2F505A (0x5112fb3d == 0x2f505a), file: $ date -r 0x5112fb3d Thu Feb 7 13:54:21 NZDT 2013 Cheers Phil
Andriy Gapon
2014-Apr-15 09:15 UTC
Re: Panic in ZFS, solaris assert: sa.sa_magic == 0x2F505A
on 15/04/2014 08:39 Phil Murray said the following:> > On 11/04/2014, at 10:36 pm, Andriy Gapon <avg@FreeBSD.org> wrote: > >> on 11/04/2014 11:02 Phil Murray said the following: >>> Hi there, >>> >>> I’ve recently experienced two kernel panics on 8.4-RELEASE (within 2 days of each other, and both around the same time of day oddly) with ZFS. Sorry no dump available, but panic below. >>> >>> Any ideas where to start solving this? Will upgrading to 9 (or 10) solve it? >> >> By chance, could the system be running zfs recv at the times when the panics >> happened? > > I think it might be related to this bug reported on ZFS-on-linux when upgrading from v3 -> v5, which is exactly what I’ve done on this machine: > > https://github.com/zfsonlinux/zfs/issues/2025 > > In my case, the bogus sa.sa_magic value looks like this: > > panic:solaris asset: sa.sa_magic == 0x2F505A (0x5112fb3d == 0x2f505a), file: > > $ date -r 0x5112fb3d > Thu Feb 7 13:54:21 NZDT 2013Great job finding that ZoL bug report! And very good job done by people who analyzed the problem. Below is my guess about what could be wrong. A thread is changing file attributes and it could end up calling zfs_sa_upgrade() to convert file's bonus from DMU_OT_ZNODE to DMU_OT_SA. The conversion is achieved in two steps: - dmu_set_bonustype() to change the bonus type in the dnode - sa_replace_all_by_template_locked() to re-populate the bonus data dmu_set_bonustype() calls dnode_setbonus_type() which does the following: dn->dn_bonustype = newtype; dn->dn_next_bonustype[tx->tx_txg & TXG_MASK] = dn->dn_bonustype; Concurrently, the sync thread can run into the dnode if it was dirtied in an earlier txg. The sync thread calls dmu_objset_userquota_get_ids() via dnode_sync(). dmu_objset_userquota_get_ids() uses dn_bonustype that has the new value, but the data corresponding to the txg being sync-ed is still in the old format. As I understand, dmu_objset_userquota_get_ids() already uses dmu_objset_userquota_find_data() when before == B_FALSE to find a proper copy of the data corresponding to the txg being sync-ed. So, I think that in that case dmu_objset_userquota_get_ids() should also use values of dn_bonustype and dn_bonuslen that correspond to the txg. If I am not mistaken, those values could be deduced from dn_next_bonustype[tx->tx_txg & TXG_MASK] plus dn_phys->dn_bonustype and dn_next_bonuslen[tx->tx_txg & TXG_MASK] plus dn_phys->dn_bonuslen. -- Andriy Gapon _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"