Reading via mmap stinks (Re: weird bugs with mmap-ing via NFS)
Matthew Dillon
dillon at apollo.backplane.com
Thu Mar 23 20:48:21 UTC 2006
:Actually, I can not agree here -- quite the opposite seems true. When running
:locally (no NFS involved) my compressor with the `-1' flag (fast, least
:effective compression), the program easily compresses faster, than it can
:read.
:
:The Opteron CPU is about 50% idle, *and so is the disk* producing only 15Mb/s.
:I guess, despite the noise I raised on this subject a year ago, reading via
:mmap continues to ignore the MADV_SEQUENTIONAL and has no other adaptability.
:
:Unlike read, which uses buffering, mmap-reading still does not pre-fault the
:file's pieces in efficiently :-(
:
:Although the program was written to compress files, that are _likely_ still in
:memory, when used with regular files, it exposes the lack of mmap
:optimization.
:
:This should be even more obvious, if you time searching for a string in a
:large file using grep vs. 'grep --mmap'.
:
:Yours,
:
: -mi
:
:http://aldan.algebra.com/~mi/mzip.c
Well, I don't know about FreeBSD, but both grep cases work just fine on
DragonFly. I can't test mzip.c because I don't see the compression
library you are calling (maybe that's a FreeBSD thing). The results
of the grep test ought to be similar for FreeBSD since the heuristic
used by both OS's is the same. If they aren't, something might have
gotten nerfed accidently in the FreeBSD tree.
Here is the cache case test. mmap is clearly faster (though I would
again caution that this should not be an implicit assumption since
VM fault overheads can rival read() overheads, depending on the
situation).
The 'x1' file in all tests below is simply /usr/share/dict/words
concactenated over and over again to produce a large file.
crater# ls -la x1
-rw-r--r-- 1 root wheel 638228992 Mar 23 11:36 x1
[ machine has 1GB of ram ]
crater# time grep --mmap asdfasf x1
1.000u 0.117s 0:01.11 100.0% 10+40k 0+0io 0pf+0w
crater# time grep --mmap asdfasf x1
0.976u 0.132s 0:01.13 97.3% 10+40k 0+0io 0pf+0w
crater# time grep --mmap asdfasf x1
0.984u 0.140s 0:01.11 100.9% 10+41k 0+0io 0pf+0w
crater# time grep asdfasf x1
0.601u 0.781s 0:01.40 98.5% 10+42k 0+0io 0pf+0w
crater# time grep asdfasf x1
0.507u 0.867s 0:01.39 97.8% 10+40k 0+0io 0pf+0w
crater# time grep asdfasf x1
0.562u 0.812s 0:01.43 95.8% 10+41k 0+0io 0pf+0w
crater# iostat 1
[ while grep is running, in order to test the cache case and verify that
no I/O is occuring once the data has been cached ]
The disk I/O case, which I can test by unmounting and remounting the
partition containing the file in question, then running grep, seems
to be well optimized on DragonFly. It should be similarly optimized
on FreeBSD since the code that does this optimization is nearly the
same. In my test, it is clear that the page-fault overhead in the
uncached case is considerably greater then the copying overhead of
a read(), though not by much. And I would expect that, too.
test28# umount /home
test28# mount /home
test28# time grep asdfasdf /home/x1
0.382u 0.351s 0:10.23 7.1% 55+141k 42+0io 4pf+0w
test28# umount /home
test28# mount /home
test28# time grep asdfasdf /home/x1
0.390u 0.367s 0:10.16 7.3% 48+123k 42+0io 0pf+0w
test28# umount /home
test28# mount /home
test28# time grep --mmap asdfasdf /home/x1
0.539u 0.265s 0:10.53 7.5% 36+93k 42+0io 19518pf+0w
test28# umount /home
test28# mount /home
test28# time grep --mmap asdfasdf /home/x1
0.617u 0.289s 0:10.47 8.5% 41+105k 42+0io 19518pf+0w
test28#
test28# iostat 1 during the test showed ~60MBytes/sec for all four tests
Perhaps you should post specifics of the test you are running, as well
as specifics of the results you are getting, such as the actual timing
output instead of a human interpretation of the results. For that
matter, being an opteron system, were you running the tests on a UP
system or an SMP system? grep is a single-threaded so on a 2-cpu
system it will show 50% cpu utilization since one cpu will be
saturated and the other idle. With specifics, a FreeBSD person can
try to reproduce your test results.
A grep vs grep --mmap test is pretty straightforward and should be
a good test of the VM read-ahead code, but there might always be some
unknown circumstance specific to a machine configuration that is
the cause of the problem. Repeatability and reproducability by
third parties is important when diagnosing any problem.
Insofar as MADV_SEQUENTIAL goes... you shouldn't need it on FreeBSD.
Unless someone ripped it out since I committed it many years ago, which
I doubt, FreeBSD's VM heuristic will figure out that the accesses
are sequential and start issuing read-aheads. It should pre-fault, and
it should do read-ahead. That isn't to say that there isn't a bug, just
that everyone interested in the problem has to be able to reproduce it
and help each other track down the source. Just making an assumption
and accusation with regards to the cause of the problem doesn't solve
it.
The VM system is rather fragile when it comes to read-ahead because
the only way to do read-ahead on mapped memory is to issue the
read-ahead and then mark some prior (already cached) page as
inaccessible in order to be able to take a VM fault and issue the
NEXT read-ahead before the program exhausts the current cached data.
It is, in fact, rather complex code, not straightforward as you
might expect.
But I can only caution you, again, on making the assumption that the
operating system should optimize your particular test case intuitively,
like a human would. Operating systems generaly optimize the most
common cases, but it would be pretty dumb to actually try to make
them optimize every conceivable case. You would wind up with hundreds
of thousands of lines of barely exercised and likely buggy code.
-Matt
More information about the freebsd-stable
mailing list