[Bug 228087] F_SETLK randomly fails on NFS4 in threaded operation in MySQL

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed May 9 04:25:40 UTC 2018


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=228087

            Bug ID: 228087
           Summary: F_SETLK randomly fails on NFS4 in threaded operation
                    in MySQL
           Product: Base System
           Version: 11.1-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: bugs at FreeBSD.org
          Reporter: barry.boes at acciodata.com

Tried in 10.4, 11.1-RELEASE, 11.1-STABLE, and 11.2-PRERELEASE client and
server.  Currently client and server are 11.2-PRERELEASE.

Ktrace shows the following :

 66181 mysqld   CALL  close(0x30)
 66181 mysqld   RET   openat 48/0x30
 66181 mysqld   CALL  fcntl(0x30,F_SETLK,0x7fffdd3e5cc0)
 66181 mysqld   RET   close 0
 66181 mysqld   RET   fcntl -1 errno 13 Permission denied


Examining a full trace, the files being locked are never locked twice by MySQL
or locked by another process.  The file closed in the first line is a different
file than that opened in the second line.   MySQL does this same operation tens
or hundreds of thousands of times successfully then fails on one.  From all of
the trace data that I've been able to gather, the FCNTL works 100% of the time
IF the close returns before another thread calls open and F_SETLK and fails
100% of the time that the SETLK completes before the close returns in another
thread.
    Observation affects the results.  Failure occurs tens to hundreds of times
more rapidly when not tracing the process.

The higher the network latency, the more likely it is to happen.  With a
latency of 200uS, it happens in seconds on a loaded server.  With a latency of
100us, it happens in tens of seconds.  With a latency of 20uS it happens
rarely, and below 15uS I have yet to see this failure.

No kernel messages are logged.  I have duplicated the problem on a variety of
hardware, from 28 core Supermicro motherboards with ECC memory and E5-2XXX V4's
to laptops with i3's, 5's, or 7's.

The filesystem setup is as follows :

server : ZFS on 11.2-PRERELEASE configured for very low latency (optimized SSDs
and persistent write caches or sync=disabled).

The filesystem is either a base ZFS filesystem or a clone of a snapshot (for
easy testing, it happens on either).

The client mounts the server system via NFS4 and also runs 11-2-PRERELEASE. 
Tested with 100Mb, gigabit, 50 gigabit, and 100Gigabit NICs.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list