kern/180060: [zfs] [panic] ZFS kernel panic, solaris assert on dsl_prop_unregister

Andreas Longwitz longwitz at incore.de
Mon Aug 12 10:35:03 UTC 2013


In the meantime I did some more analysis of the problem and can explain
why the panic happens. Two threads are involved in the problem, one runs
the zfs command and wants to do a rollback using an ioctl() on /dev/zfs,
the other runs mountd and tries to do an "unmount exports" with
"export.ex_flags = MNT_DELEXPORT". Both threads are working on the same
dataset ds=0xffffff0126912c00.

The rollback thread wants to unregister his aclinherit property and
panics, because this property does not exist in the list of properties
anymore:

(kgdb) f 12
#12 0xffffffff80cbd0b8 in zfs_unregister_callbacks
(zfsvfs=0xffffff01f5664000) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1278
1278      VERIFY(dsl_prop_unregister(ds, "aclinherit",
(kgdb) l
1273            zfsvfs) == 0);
1274
1275      VERIFY(dsl_prop_unregister(ds, "aclmode", acl_mode_changed_cb,
1276            zfsvfs) == 0);
1277
1278      VERIFY(dsl_prop_unregister(ds, "aclinherit",
1279            acl_inherit_changed_cb, zfsvfs) == 0);
1280
1281      VERIFY(dsl_prop_unregister(ds, "vscan",
1282            vscan_changed_cb, zfsvfs) == 0);

The mountd thread wants to register some properties, but he first
unregisters everything (see comment in the source):

(kgdb) f 14
#14 0xffffffff80cc0048 in zfs_mount (vfsp=0xffffff03e49398d0) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1706
1706                    error = zfs_register_callbacks(vfsp);
(kgdb) list
1700  * When doing a remount, we simply refresh our temporary properties
1701  * according to those options set in the current VFS options.
1702  */
1703       if (vfsp->vfs_flag & MS_REMOUNT) {
1704               /* refresh mount options */
1705               zfs_unregister_callbacks(vfsp->vfs_data);
1706               error = zfs_register_callbacks(vfsp);
1707               goto out;
1708       }
1709

There is a little time gap between line 1705 and 1706 where the property
aclinherit is not registered. If another thread tries to unregister this
property during this gap he will panic.

I don't know how to fix this proper. Simple to remove the VERIFY on
dsl_prop_unregister() is easy, but I hope that one of the ZFS gurus will
have a look at this and we will get a better solution.

-- 
Andreas Longwitz


More information about the freebsd-bugs mailing list