weird bugs with mmap-ing via NFS
dillon at apollo.backplane.com
Tue Mar 21 22:48:37 UTC 2006
: [Moved from -current to -stable]
:×¦×ÔÏÒÏË 21 ÂÅÒÅÚÅÎØ 2006 16:23, Matthew Dillon ÷É ÎÁÐÉÓÁÌÉ:
:> You might be doing just writes to the mmap()'d memory, but the system
:> doesn't know that.
:Actually, it does. The program tells it, that I don't care to read, what's
:currently there, by specifying the PROT_READ flag only.
That's an architectural flag. Very few architectures actually support
write-only memory maps. IA32 does not. It does not change the
fact that the operating system must validate the memory underlying
the page, nor does it imply that the system shouldn't.
:Sounds like a missed optimization opportunity :-(
Even on architectures that did support write-only memory maps, the
system would still have to fault in the rest of the data on the page,
because the system would have no way of knowing which bytes in the
page you wrote to (that is, whether you wrote to all the bytes in the
page or whether you left gaps). The system does not take a fault for
every write you issue to the page, only for the first one. So, no
matter how you twist it, the system *MUST* validate the entire page
when it takes the page fault.
:> It kinda sounds like the buffer cache is getting blown out, but not
:> having seen the program I can't really analyze it.
I can't access this URL, it says 'not found'.
:> It will always be more efficient to write to a file using write() then
:> using mmap()
:I understand, that write() is much better optimized at the moment, but the
:mmap interface carries some advantages, which may allow future OSes to
:optimize their ways. The application can hint at its planned usage of the
:data via madvise, for example.
Yes, but those advantages are limited by the way memory mapping hardware
works. There are some things that simply cannot be optimized through
lack of sufficient information.
Reading via mmap() is very well optimized. Making modifications via
mmap() is optimized insofar as the expectation that the data is intended
to be read, modified, and written back. It is not possible to
optimize with the expectation that the data would only be written to
the mmap, for the reasons described above. The hardware simply does not
provide sufficient information to the operating system to optimize
the write-only case.
:Unfortunately, my problem, so far, is with it not writing _at all_...
Not sure what is going on since I can't access the program yet, but
I'd be happy to take a look at the code.
The most common mistake people make when trying to write to a file via
mmap() is that they forget to ftruncate() the file to the proper length
first. Mapped memory beyond the file's EOF is ignored within the last
page, and the program will take a page fault if it tries to write to
mapped pages that are entire beyond the file's current EOF. Writing
to mapped memory does *not* extend the size of a file. Only
ftruncate() or write() can extend the size of a file.
The second most common mistake is to forget to specify MAP_SHARED
in the mmap() call.
:Yes, this is an example of how a good implemented mmap can be better than
:write. Without explicit writes by the application and without doubling the
:memory requirements, the data can be written in the most optimal way.
:Thanks for your help. Yours,
I don't think mmap()-based writing will EVER be more efficient then
write() except in the case where the entire data set fits into memory
and has been entirely cached by the system. In that one case writing via
mmap will be faster. In all other cases the system will be taking as
many VM faults on the pages as it would be taking system call faults
to execute the write()'s.
You are making a classic mistake by assuming that the copying overhead
of a write() into the file's backing store, verses directly mmap()ing
the file's backing store, represents a large chunk of the overhead for
the operation. In fact, the copying overhead represents only a small
chunk of the related overhead. The vast majority of the overhead is
always going to be the disk I/O itself.
I/O must occur even in the cached/delayed-write case so on a busy system
it still represents the greatest overhead from the point of view of
system load. On a lightly loaded system nobody is going to care about
a few milliseconds of improved performance here and there since, by
definition, the system is lightly loaded and thus has plenty of idle
cpu and I/O cycles to spare.
<dillon at backplane.com>
More information about the freebsd-stable