weird bugs with mmap-ing via NFS

Tue Mar 21 23:15:22 UTC 2006

вівторок 21 березень 2006 17:48, Matthew Dillon Ви написали:
> :Actually, it does. The program tells it, that I don't care to read, what's
> :currently there, by specifying the PROT_READ flag only.
>
>     That's an architectural flag.  Very few architectures actually support
>     write-only memory maps.

Why does the flag being architectural matter? The application tells the OS, 
that it only plans to write...

>     It does not change the fact that the operating system must validate the
>     memory underlying the page, nor does it imply that the system shouldn't.

> :Sounds like a missed optimization opportunity :-(

>    Even on architectures that did support write-only memory maps, the
>    system would still have to fault in the rest of the data on the page,
>    because the system would have no way of knowing which bytes in the 
>    page you wrote to (that is, whether you wrote to all the bytes in the
>    page or whether you left gaps).

Indeed, but in my case there is no data in the target file to begin with -- it 
is "created" by ftruncate() prior to mmap-ing.

> :See http://aldan.algebra.com/~mi/mzip.c

>    I can't access this URL, it says 'not found'.

Uh, sorry, the newest Apache is quite restrictive. I just tweaked it, please, 
try again.

> :The application can hint at its planned usage of the 
> :data via madvise, for example.

>    Yes, but those advantages are limited by the way memory mapping hardware
>    works.  There are some things that simply cannot be optimized through
>    lack of sufficient information.

There is no need for additional information from hardware, when, deciding -- 
based on the information supplied by madvise -- which parts of the file (if 
any) to keep in cache.

>    I don't think mmap()-based writing will EVER be more efficient then
>    write() except in the case where the entire data set fits into memory
>    and has been entirely cached by the system.

My custom compressor is intended to operate on the database and filesystem 
dumps, as they arrive (uncompressed) from the computers being backed up via 
NFS. It is intended to pick most of the input data, while it is still in the 
RAM cache.

It was more convenient for me to implement outputting via mmap as well. The 
bulk of the time is spent reading and compressing anyway -- the output is 
many time smaller than the input. So the write performance never bothered me, 
until I tried to do it via NFS and encountered all of these bugs :-( ...

	-mi