Experiences with ZFS v28 - including deadlock

Martin Matuska mm at FreeBSD.org
Fri Jul 15 22:02:28 UTC 2011


Hi Luke,

regarding the incremental receive, does the mount happen even if  using
the "-u" option to the zfs receive command?

The manpage for zfs (receive section) says:

-u
File system that is associated with the received stream is  not mounted.

Cheers,
mm

Dňa 15. 7. 2011 14:30, Luke Marsden  wrote / napísal(a):
> Hi all,
>
> Having just quite extensively tested the v28 patchset contained within
> http://mfsbsd.vx.sk/iso/mfsbsd-se-8.2-zfsv28-amd64.iso (updated
> 19.06.2011) I wanted to share my experiences in the hope that the issues
> I encountered can be fixed before 8.3 ;-)
>
> The biggest issue was a DEADLOCK which occurs quite reliably with a
> given sequence of events in short succession, on a chroot filesystem
> with many snapshots and a MySQL socket and nullfs mounts inside it:
>
>      1. Force unmount the nullfs mounts which are mounted on top of it
>      2. Close the MySQL socket in /tmp
>      3. Force unmount the actual filesystem (even if there are open FDs)
>      4. 'zfs rename' the filesystem into our 'trash' filesystem (which I
>         understand consists of a clone, promote and destroy)
>
> The entire ZFS subsystem then hangs on any new I/O.
>
> Here is a procstat of the zfs rename process which hangs after the force
> unmount:
>
> 25674 100871 zfs              initial thread   mi_switch+0x176
> sleepq_wait+0x42 _cv_wait+0x129 txg_wait_synced+0x85
> dsl_sync_task_group_wait+0x128 dsl_sync_task_do+0x54 dsl_dir_rename+0x8f
> dsl_dataset_rename+0x272 zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl
> +0x102 ioctl+0xfd syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 
>
> Unfortunately it's not easy to reproduce, it only seems to happen in an
> environment which is under load with a lot of datasets and a lot of zfs
> operations happening concurrently on other datasets.  I spent two days
> trying to reproduce it in self-contained test environments but had no
> luck, so I'm now reporting it anyway.
>
> There were two other issues which came up:
>
>      1. http://www.freebsd.org/cgi/query-pr.cgi?pr=157728 - we worked
>         around this with a semaphore on 'zfs list' and 'zfs recv' so
>         they never ran simultaneously.
>      2. After an incremental receive, v28 seems to like to mount the
>         filesystem even if it was unmounted at the start of the receive.
>         (Notably, on previous versions of ZFS, this only happened for
>         non-incremental receives where the filesystem was being created
>         by the receive -- incremental receives correctly left the
>         filesystem in the mount state it started in). This plays very
>         badly when the filesystem then gets modified before we can force
>         unmount it (which we do immediately), because in this case the
>         next receive operation will fail with "filesystem has
>         modifications" - which we handle, but it's expensive to do so on
>         every incremental receive.
>
> I had a conversation with jhell on IRC about this and he had this to
> say:
>
> <jhell> its happened twice before with ZFS basically a lock being held
> and never free'd
> <jhell> something there is happening between the snapshots and datasets
> though. seems that it for some reason is able to destroy the dataset
> before it destroys all the snapshots properly
> <jhell> then tries to do the renaming of the snapshots and leads to a
> lock not being free()'d or similar
>
> Maybe this can offer a hint for someone to go looking in the right
> direction to solve this?
>
> Thank you for working on ZFS in FreeBSD!  v15 is working very well for
> us.
>


-- 
Martin Matuska
FreeBSD committer
http://blog.vx.sk



More information about the freebsd-fs mailing list