Panic in ZFS, solaris assert: sa.sa_magic == 0x2F505A
Andriy Gapon
avg at FreeBSD.org
Tue Apr 15 09:16:57 UTC 2014
on 15/04/2014 08:39 Phil Murray said the following:
>
> On 11/04/2014, at 10:36 pm, Andriy Gapon <avg at FreeBSD.org> wrote:
>
>> on 11/04/2014 11:02 Phil Murray said the following:
>>> Hi there,
>>>
>>> I’ve recently experienced two kernel panics on 8.4-RELEASE (within 2 days of each other, and both around the same time of day oddly) with ZFS. Sorry no dump available, but panic below.
>>>
>>> Any ideas where to start solving this? Will upgrading to 9 (or 10) solve it?
>>
>> By chance, could the system be running zfs recv at the times when the panics
>> happened?
>
> I think it might be related to this bug reported on ZFS-on-linux when upgrading from v3 -> v5, which is exactly what I’ve done on this machine:
>
> https://github.com/zfsonlinux/zfs/issues/2025
>
> In my case, the bogus sa.sa_magic value looks like this:
>
> panic:solaris asset: sa.sa_magic == 0x2F505A (0x5112fb3d == 0x2f505a), file:
>
> $ date -r 0x5112fb3d
> Thu Feb 7 13:54:21 NZDT 2013
Great job finding that ZoL bug report! And very good job done by people who
analyzed the problem.
Below is my guess about what could be wrong.
A thread is changing file attributes and it could end up calling
zfs_sa_upgrade() to convert file's bonus from DMU_OT_ZNODE to DMU_OT_SA. The
conversion is achieved in two steps:
- dmu_set_bonustype() to change the bonus type in the dnode
- sa_replace_all_by_template_locked() to re-populate the bonus data
dmu_set_bonustype() calls dnode_setbonus_type() which does the following:
dn->dn_bonustype = newtype;
dn->dn_next_bonustype[tx->tx_txg & TXG_MASK] = dn->dn_bonustype;
Concurrently, the sync thread can run into the dnode if it was dirtied in an
earlier txg. The sync thread calls dmu_objset_userquota_get_ids() via
dnode_sync(). dmu_objset_userquota_get_ids() uses dn_bonustype that has the new
value, but the data corresponding to the txg being sync-ed is still in the old
format.
As I understand, dmu_objset_userquota_get_ids() already uses
dmu_objset_userquota_find_data() when before == B_FALSE to find a proper copy of
the data corresponding to the txg being sync-ed.
So, I think that in that case dmu_objset_userquota_get_ids() should also use
values of dn_bonustype and dn_bonuslen that correspond to the txg.
If I am not mistaken, those values could be deduced from
dn_next_bonustype[tx->tx_txg & TXG_MASK] plus dn_phys->dn_bonustype and
dn_next_bonuslen[tx->tx_txg & TXG_MASK] plus dn_phys->dn_bonuslen.
--
Andriy Gapon
More information about the freebsd-fs
mailing list