Panic in ZFS, solaris assert: sa.sa_magic == 0x2F505A

Andriy Gapon avg at FreeBSD.org
Tue Apr 15 09:16:57 UTC 2014


on 15/04/2014 08:39 Phil Murray said the following:
> 
> On 11/04/2014, at 10:36 pm, Andriy Gapon <avg at FreeBSD.org> wrote:
> 
>> on 11/04/2014 11:02 Phil Murray said the following:
>>> Hi there,
>>>
>>> I’ve recently experienced two kernel panics on 8.4-RELEASE (within 2 days of each other, and both around the same time of day oddly) with ZFS. Sorry no dump available, but panic below.
>>>
>>> Any ideas where to start solving this? Will upgrading to 9 (or 10) solve it?
>>
>> By chance, could the system be running zfs recv at the times when the panics
>> happened?
> 
> I think it might be related to this bug reported on ZFS-on-linux when upgrading from v3 -> v5, which is exactly what I’ve done on this machine:
> 
>    https://github.com/zfsonlinux/zfs/issues/2025
> 
> In my case, the bogus sa.sa_magic value looks like this:
> 
>    panic:solaris asset: sa.sa_magic == 0x2F505A (0x5112fb3d == 0x2f505a), file: 
> 
>    $ date -r 0x5112fb3d
>    Thu Feb  7 13:54:21 NZDT 2013

Great job finding that ZoL bug report!  And very good job done by people who
analyzed the problem.

Below is my guess about what could be wrong.

A thread is changing file attributes and it could end up calling
zfs_sa_upgrade() to convert file's bonus from DMU_OT_ZNODE to DMU_OT_SA.  The
conversion is achieved in two steps:
- dmu_set_bonustype() to change the bonus type in the dnode
- sa_replace_all_by_template_locked() to re-populate the bonus data

dmu_set_bonustype() calls dnode_setbonus_type() which does the following:
        dn->dn_bonustype = newtype;
        dn->dn_next_bonustype[tx->tx_txg & TXG_MASK] = dn->dn_bonustype;

Concurrently, the sync thread can run into the dnode if it was dirtied in an
earlier txg.  The sync thread calls dmu_objset_userquota_get_ids() via
dnode_sync().  dmu_objset_userquota_get_ids() uses dn_bonustype that has the new
value, but the data corresponding to the txg being sync-ed is still in the old
format.

As I understand, dmu_objset_userquota_get_ids() already uses
dmu_objset_userquota_find_data() when before == B_FALSE to find a proper copy of
the data corresponding to the txg being sync-ed.
So, I think that in that case dmu_objset_userquota_get_ids() should also use
values of dn_bonustype and dn_bonuslen that correspond to the txg.
If I am not mistaken, those values could be deduced from
dn_next_bonustype[tx->tx_txg & TXG_MASK] plus dn_phys->dn_bonustype and
dn_next_bonuslen[tx->tx_txg & TXG_MASK] plus dn_phys->dn_bonuslen.

-- 
Andriy Gapon


More information about the freebsd-fs mailing list