[Bug 219972] Unable to zpool export following some zfs recv

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Thu Jun 22 00:33:40 UTC 2017


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219972

--- Comment #3 from pfribeiro at gmail.com ---
It seems the ability to reproduce this bug relies on there being more than one
CPU core, relatively unused, on the same system. In the case of my original
system, if I run 'dd if=/dev/zero of=/dev/null' while I try the steps mentioned
in comment #1, then the bug is not reproducible. However, as soon as this
command is killed, the bug is reproducible.

On the forum thread linked to in comment #2, another user tried to reproduce
the problem, but without success. I then configured two FreeBSD VMs on ESXi
(host is a Xeon-D 1518, 4 core, with HyperThreading enabled), one running the
VMDK image provided on the official FreeBSD download page, and another
installed from the ISO image (with ZFS as root filesystem) to better mirror my
installation on my original system, an Intel NUC5CPYH. Initially, I also could
not reproduce the bug in the VMs, no matter how many times I tried.

However, having observed the behaviour on the NUC installation, I then
proceeded to change the CPU affinity on ESXi, so that the VM is allocated
logical cores 0,2, and has minimum 1000MHz reserved. By running the
import/export of 'slave' multiple times (after the respective zfs send/recv), I
was eventually able to trigger this on the 56th run of import/export.

Regarding the NUC installation, I can see that killing 'dd if=/dev/zero
of=/dev/null' (ie, making the other core widely available) is only relevant for
the import/export of 'slave' after the 'zfs send | recv' has taken place, which
suggests that there is a race-condition of sorts in the zpool export ioctl code
(which somehow relies on a previous recv).

I would be happy to provide more diagnostics, however I would need further
guidance from you, as I am not very familiar with the synchronization
primitives of FreeBSD. I believe this bug will be hard to track down.

I would also like to add that I have successfully (following the above caveats)
reproduced this bug under more than one platform, with the following versions,
all on amd64:

11.0-RELEASE-p1 #0 r306420
11.0-RELEASE-p9 #0: Tue Apr 11 08:48:40 UTC 2017
11.1-BETA2 #0 r320072: Sun Jun 18 18:45:14 BST 2017 (I compiled my own kernel
with debugging on)

Finally, for completeness my rudimentary test bash script is available at:
https://pastebin.com/YcKSU1LA. You're welcome to check the related forum post
from comment #2 as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-fs mailing list