Reading via mmap stinks (Re: weird bugs with mmap-ing via NFS)
dillon at apollo.backplane.com
Thu Mar 23 23:16:34 UTC 2006
:Yes, they both do work fine, but time gives very different stats for each. In
:my experiments, the total CPU time is noticably less with mmap, but the
:elapsed time is (much) greater. Here are results from FreeBSD-6.1/amd64 --
:notice the large number of page faults, because the system does not try to
:preload file in the mmap case as it does in the read case:
: time fgrep meowmeowmeow /home/oh.0.dump
: 2.167u 7.739s 1:25.21 11.6% 70+3701k 23663+0io 6pf+0w
: time fgrep --mmap meowmeowmeow /home/oh.0.dump
: 1.552u 7.109s 2:46.03 5.2% 18+1031k 156+0io 106327pf+0w
:Use a big enough file to bust the memory caching (oh.0.dump above is 2.9Gb),
:I'm sure, you will have no problems reproducing this result.
106,000 page faults. How many pages is a 2.9GB file? If this is running
in 64-bit mode those would be 8K pages, right? So that would come to
around 380,000 pages. About 1:4. So, clearly the operating system
*IS* pre-faulting multiple pages.
Since I don't believe that a memory fault would be so inefficient as
to account for 80 seconds of run time, it seems more likely to me that
the problem is that the VM system is not issuing read-aheads. Not
issuing read-aheads would easily account for the 80 seconds.
It is possible that the kernel believes the VM system to be too loaded
to issue read-aheads, as a consequence of your blowing out of the system
caches. It is also possible that the read-ahead code is broken in
FreeBSD. To determine which of the two is more likely, you have to
run a smaller data set (like 600MB of data on a system with 1GB of ram),
and use the unmount/mount trick to clear the cache before each grep test.
If the time differential is still huge using the unmount/mount data set
test as described above, then the VM system's read-ahead code is broken.
If the time differential is tiny, however, then it's probably nothing
more then the kernel interpreting your massive 2.9GB mmap as being
too stressful on the VM system and disabling read-aheads for that
In anycase, this sort of test is not really a good poster child for how
to use mmap(). Nobody in their right mind uses mmap() on datasets that
they expect to be uncacheable and which are accessed sequentially. It's
just plain silly to use mmap() in that sort of circumstance. This is
a trueism on ANY operating system, not just FreeBSD. The uncached
data set test (using unmount/mount and a dataset which fits into memory)
is a far more realistic test because it simulates the most common case
encountered by a system under load... the accessing of a reasonably sized
data set which happens to not be in the cache.
More information about the freebsd-stable