more weird bugs with mmap-ing via NFS
Matthew Dillon
dillon at apollo.backplane.com
Tue Mar 21 22:56:25 UTC 2006
:When the client is in this state it remains quite usable except for the
:following:
:
: 1) Trying to start `systat 1 -vm' stalls ALL access to local disks,
: apparently -- no new programs can start, and the running ones
: can not access any data either; attempts to Ctrl-C the starting
: systat succeed only after several minutes.
:
: 2) The writing process is stuck unkillable in the following state:
:
: CPU PRI NI VSZ RSS MWCHAN STAT TT TIME
: 27 -4 0 1351368 137764 nfs DL p4 1:05,52
:
: Sending it any signal has no effect. (Large sizes are explained
: by it mmap-ing its large input and output.)
:
: 3) Forceful umount of the share, that the program is writing to,
: paralyzes the system for several minutes -- unlike in 1), not
: even the mouse is moving. It would seem, the process is dumping
: core, but it is not -- when the system unfreezes, the only
: message from the kernel is:
:
: vm_fault: pager read error, pid XXXX (mzip)
:
:Again, this is on 6.1/i386 from today, which we are about to release into the
:cruel world.
:
:Yours,
:
: -mi
There are a number of problems using a block size of 65536. First of
all, I think you can only safely do it if you use a TCP mount, also
assuming the TCP buffer size is appropriately large to hold an entire
packet. For UDP mounts, 65536 is too large (the UDP data length can
only be 65536 bytes. For that matter, the *IP* packet itself can
not exceed 65535 bytes. So 65536 will not work with a UDP mount.
The second problem is related to the network driver. The packet MTU
is 1500, which means, typically, a limit of around 1460-1480 payload
bytes per packet. A UDP large UDP packet that is, say, 48KB, will be
broken down into over 33 IP packet fragments. The network stack could
very well drop some of these packet fragments making delivery of the
overall UDP packet unreliable.
The NFS protocol itself does allow read and write packets to be
truncated providing that the read or write operation is either bounded
by the file EOF or (for a read) the remaining data is all zero's.
Typically the all-zero's case is only optimized by the NFS server when
the underlying filesystem block itself is unallocated (i.e. a 'hole'
in the file). In all other cases the full NFS block size is passed
between client and server.
I would stick to an NFS block size of 8K or 16K. Frankly, there is
no real reason to use a larger block size.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-stable
mailing list