[Bug 227740] concurrent zfs management operations may lead to a race/subsystem locking
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Tue Apr 24 13:27:31 UTC 2018
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227740
Bug ID: 227740
Summary: concurrent zfs management operations may lead to a
race/subsystem locking
Product: Base System
Version: 11.1-STABLE
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: bugs at FreeBSD.org
Reporter: emz at norma.perm.ru
concurrent zfs commands operations may lead to a race/subsystem locking.
for instance this is the current state wich is not changing for at least 30
minutes (system got into it after issuing concurrent zfs commands):
===Cut===
[root at san1:~]# ps ax | grep zfs
9 - DL 7:41,34 [zfskern]
57922 - Is 0:00,01 sshd: zfsreplica [priv] (sshd)
57924 - I 0:00,00 sshd: zfsreplica at notty (sshd)
57925 - Is 0:00,00 csh -c zfs list -t snapshot
57927 - D 0:00,00 zfs list -t snapshot
58694 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
58695 - D 0:00,00 /sbin/zfs list -t all
59512 - Is 0:00,02 sshd: zfsreplica [priv] (sshd)
59516 - I 0:00,00 sshd: zfsreplica at notty (sshd)
59517 - Is 0:00,00 csh -c zfs list -t snapshot
59520 - D 0:00,00 zfs list -t snapshot
59552 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59553 - D 0:00,00 /sbin/zfs list -t all
59554 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59555 - D 0:00,00 /sbin/zfs list -t all
59556 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59557 - D 0:00,00 /sbin/zfs list -t all
59558 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59559 - D 0:00,00 /sbin/zfs list -t all
59560 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59561 - D 0:00,00 /sbin/zfs list -t all
59564 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59565 - D 0:00,00 /sbin/zfs list -t all
59570 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59571 - D 0:00,00 /sbin/zfs list -t all
59572 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59573 - D 0:00,00 /sbin/zfs list -t all
59574 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59575 - D 0:00,00 /sbin/zfs list -t all
59878 - Is 0:00,02 sshd: zfsreplica [priv] (sshd)
59880 - I 0:00,00 sshd: zfsreplica at notty (sshd)
59881 - Is 0:00,00 csh -c zfs list -t snapshot
59883 - D 0:00,00 zfs list -t snapshot
60800 - Is 0:00,01 sshd: zfsreplica [priv] (sshd)
60806 - I 0:00,00 sshd: zfsreplica at notty (sshd)
60807 - Is 0:00,00 csh -c zfs list -t snapshot
60809 - D 0:00,00 zfs list -t snapshot
60917 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
60918 - D 0:00,00 /sbin/zfs list -t all
60950 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
60951 - D 0:00,00 /sbin/zfs list -t all
60966 - Is 0:00,02 sshd: zfsreplica [priv] (sshd)
60968 - I 0:00,00 sshd: zfsreplica at notty (sshd)
60969 - Is 0:00,00 csh -c zfs list -t snapshot
60971 - D 0:00,00 zfs list -t snapshot
61432 - Is 0:00,03 sshd: zfsreplica [priv] (sshd)
61434 - I 0:00,00 sshd: zfsreplica at notty (sshd)
61435 - Is 0:00,00 csh -c zfs list -t snapshot
61437 - D 0:00,00 zfs list -t snapshot
61502 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61503 - D 0:00,00 /sbin/zfs list -t all
61504 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61505 - D 0:00,00 /sbin/zfs list -t all
61506 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61507 - D 0:00,00 /sbin/zfs list -t all
61508 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61509 - D 0:00,00 /sbin/zfs list -t all
61510 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61511 - D 0:00,00 /sbin/zfs list -t all
61512 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61513 - D 0:00,00 /sbin/zfs list -t all
61569 - I 0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61570 - D 0:00,00 /sbin/zfs list -t all
61851 - Is 0:00,02 sshd: zfsreplica [priv] (sshd)
61853 - I 0:00,00 sshd: zfsreplica at notty (sshd)
61854 - Is 0:00,00 csh -c zfs list -t snapshot
61856 - D 0:00,00 zfs list -t snapshot
57332 7 D+ 0:00,04 zfs rename data/esx/boot-esx03
data/esx/boot-esx03_orig
58945 8 D+ 0:00,00 zfs list
62119 3 S+ 0:00,00 grep zfs
[root at san1:~]# ps ax | grep ctladm
62146 3 S+ 0:00,00 grep ctladm
[root at san1:~]#
===Cut===
This seems to be the operation that locks the system:
zfs rename data/esx/boot-esx03 data/esx/boot-esx03_orig
the dataset info:
===Cut===
# zfs get all data/esx/boot-esx03
NAME PROPERTY VALUE SOURCE
data/esx/boot-esx03 type volume -
data/esx/boot-esx03 creation ср авг. 2 15:48 2017 -
data/esx/boot-esx03 used 8,25G -
data/esx/boot-esx03 available 9,53T -
data/esx/boot-esx03 referenced 555M -
data/esx/boot-esx03 compressratio 1.06x -
data/esx/boot-esx03 reservation none default
data/esx/boot-esx03 volsize 8G local
data/esx/boot-esx03 volblocksize 8K default
data/esx/boot-esx03 checksum on default
data/esx/boot-esx03 compression lz4
inherited from data
data/esx/boot-esx03 readonly off default
data/esx/boot-esx03 copies 1 default
data/esx/boot-esx03 refreservation 8,25G local
data/esx/boot-esx03 primarycache all default
data/esx/boot-esx03 secondarycache all default
data/esx/boot-esx03 usedbysnapshots 0 -
data/esx/boot-esx03 usedbydataset 555M -
data/esx/boot-esx03 usedbychildren 0 -
data/esx/boot-esx03 usedbyrefreservation 7,71G -
data/esx/boot-esx03 logbias latency default
data/esx/boot-esx03 dedup off
inherited from data/esx
data/esx/boot-esx03 mlslabel -
data/esx/boot-esx03 sync standard default
data/esx/boot-esx03 refcompressratio 1.06x -
data/esx/boot-esx03 written 555M -
data/esx/boot-esx03 logicalused 586M -
data/esx/boot-esx03 logicalreferenced 586M -
data/esx/boot-esx03 volmode dev
inherited from data
data/esx/boot-esx03 snapshot_limit none default
data/esx/boot-esx03 snapshot_count none default
data/esx/boot-esx03 redundant_metadata all default
===Cut===
Since the dataset is only 8G big, it's unlikely that it should take that amount
of time to be rename, considering disks are idle.
Got this two times in a row, and as a result all the zfs/zpool commands stopped
working.
I have manually brought the system into panicking to get the crashdumps.
Crashdumps are located here:
http://san1.linx.playkey.net/r332096M/
along with a brief description and full kernel/module binaries.
Please note that the vmcore.0 is from another panic, this lockup crashdumps are
1 (unfortunately, no txt files saved) and 2.
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list