more weird bugs with mmap-ing via NFS
Matthew Dillon
dillon at apollo.backplane.com
Wed Mar 22 00:26:02 UTC 2006
:I don't specify either, but the default is UDP, is not it?
Yes, the default is UDP.
:> Now imagine a client that experiences this problem only
:> sometimes. Modern hardware, but for some reason (network
:> congestion?) some frames are still lost if sent back-to-back.
:> (Realtek chipset on the receiving side?)
:
:No, both sides have em-cards and are only separated by a rather decent large
:switch.
:
:I'll try the TCP mount, workaround. If it helps, we can assume, our UDP NFS is
:broken for sustained high bandwidth writes :-(
:
:Thanks!
:
: -mi
I can't speak for FreeBSD's current implementation, but it should be
possible to determine whether there is an issue with packet drops or
not by observing the network statistics via netstat -s. Generally
speaking, however, I know of no problems with a UDP NFS mount per-say,
at least as long reasonable values are chosen for the block size.
The mmap() call in your mzip.c program looks ok to me with the exception
of the use of PROT_WRITE. Try using PROT_READ|PROT_WRITE. The
ftruncate() looks ok as well. If the program works over a local
filesystem but fails to produce data in the output file on an NFS
mount (but completes otherwise), then there is a bug in NFS somewhere.
If the problem is simply due to the program stalling, and not completing
due to the stalling, then it could be a problem with dropped packets
in the network stack. If the problem is that the program simply runs
very inefficiently over NFS, with excessive network bandwidth for the
data being written (as you also reported), this is probably an artifact
of attempting to use mmap() to write out the data, for reasons previously
discussed.
I would again caution against using mmap() to populate a file in this
manner. Even with MADV_SEQUENTIAL there is no guarentee that the system
will actually flush the pages to the actual file on the server
sequentially, and you could end up with a very badly fragmented file.
When a file is truncated to a larger size the underlying filesystem
does not allocate the actual backing store on disk for the data hole
created. Allocation winds up being based on the order in which the
operating system flushes the VM pages. The VM system does its best, but
it is really designed more as a random-access system rather then a
sequential system. Pages are flushed based on memory availability and
a thousand other factors and may not necessarily be flushed to the file
in the order you think they should be. write() is really a much better
way to write out a sequential file (on any operating system, not
just BSD).
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-stable
mailing list