Reading via mmap stinks (Re: weird bugs with mmap-ing via NFS)

Thu Mar 23 23:16:34 UTC 2006

:Yes, they both do work fine, but time gives very different stats for each. In 
:my experiments, the total CPU time is noticably less with mmap, but the 
:elapsed time is (much) greater. Here are results from FreeBSD-6.1/amd64 -- 
:notice the large number of page faults, because the system does not try to 
:preload file in the mmap case as it does in the read case:
:
:	time fgrep meowmeowmeow /home/oh.0.dump
:	2.167u 7.739s 1:25.21 11.6%     70+3701k 23663+0io 6pf+0w
:	time fgrep --mmap  meowmeowmeow /home/oh.0.dump
:	1.552u 7.109s 2:46.03 5.2%      18+1031k 156+0io 106327pf+0w
:
:Use a big enough file to bust the memory caching (oh.0.dump above is 2.9Gb), 
:I'm sure, you will have no problems reproducing this result.

    106,000 page faults.  How many pages is a 2.9GB file?  If this is running
    in 64-bit mode those would be 8K pages, right?  So that would come to 
    around 380,000 pages.  About 1:4.  So, clearly the operating system 
    *IS* pre-faulting multiple pages.  

    Since I don't believe that a memory fault would be so inefficient as
    to account for 80 seconds of run time, it seems more likely to me that
    the problem is that the VM system is not issuing read-aheads.  Not
    issuing read-aheads would easily account for the 80 seconds.

    It is possible that the kernel believes the VM system to be too loaded
    to issue read-aheads, as a consequence of your blowing out of the system
    caches.  It is also possible that the read-ahead code is broken in
    FreeBSD.  To determine which of the two is more likely, you have to
    run a smaller data set (like 600MB of data on a system with 1GB of ram),
    and use the unmount/mount trick to clear the cache before each grep test.

    If the time differential is still huge using the unmount/mount data set
    test as described above, then the VM system's read-ahead code is broken.
    If the time differential is tiny, however, then it's probably nothing
    more then the kernel interpreting your massive 2.9GB mmap as being
    too stressful on the VM system and disabling read-aheads for that
    reason.

    In anycase, this sort of test is not really a good poster child for how
    to use mmap().  Nobody in their right mind uses mmap() on datasets that
    they expect to be uncacheable and which are accessed sequentially.  It's
    just plain silly to use mmap() in that sort of circumstance.  This is
    a trueism on ANY operating system, not just FreeBSD.  The uncached
    data set test (using unmount/mount and a dataset which fits into memory)
    is a far more realistic test because it simulates the most common case
    encountered by a system under load... the accessing of a reasonably sized
    data set which happens to not be in the cache.

						-Matt