An old gripe: Reading via mmap stinks

Thu Jan 14 21:06:58 UTC 2010

03/25/06 14:03, John-Mark Gurney wrote:
> The other useful/interesting number would be to compare system time
> between the mmap case and the read case to see how much work the
> kernel is doing in each case...

After adding begin- and end-offset options to md5(1) -- implemented 
using mmap (see bin/142814) -- I, once again, am upset over the slowness 
of pagefaulting-in compared to the reading-in.

(To reproduce my results, patch your /usr/src/sbin/md5/ with

	http://aldan.algebra.com/~mi/tmp/md5-offsets.patch

Then use plain ``md5 LARGE_FILE'' to use read and ``md5 -b 0 
LARGE_FILE'' to use the mmap-method.)

The times for processing an 8Gb file residing on a reasonable IDE drive 
(on a recent FreeBSD-7.2-StABLE/i386) are thus:

   mmap:	43.400u 9.439s 2:35.19 34.0%    16+184k 0+0io 106994pf+0w
   read: 41.358u 23.799s 2:12.04 49.3%   16+177k 67677+0io 0pf+0w

Observe, that even though read-ing is quite taxing on the kernel (high 
sys-time), the mmap-ing loses overall -- at least, on an otherwise idle 
system -- because read gets the full throughput of the drive (systat -vm 
shows 100% disk utilization), while pagefaulting gets only about 69%.

When I last brought this up in 2006, it was "revealed", that read(2) 
uses heuristics to perform a read-ahead. Why can't the pagefaulting-in 
implementation use the same or similar "trickery" was never explained...

Now, without a clue on how these things are implemented, I'll concede, 
that, probably, it may /sometimes/ be difficult for VM to predict, where 
the next pagefault will strike, but in the cases, when the process:

	a) mmaps up to 1Gb at a time;
	b) issues an madvise MADV_SEQUENTIAL over the entire mmap-ed
	   region

mmaping ought to offer the same -- or better -- performance, than read. 
For example, a hit on a page inside a region marked as SEQUENTIAL ought 
to bring in the next page or two. VM has all the information and the 
hints, just does not use them... Shame, is not it?

	-mi

P.S. If it is any consolation, on Linux things seem to be even worse. 
Processing a 9Gb file on kernel 2.6.18/i386:

    mmap: 26.222u 6.336s 6:01.75 8.9%     0+0k 0+0io 61032pf+0w
    read: 25.991u 7.686s 3:43.70 15.0%    0+0k 0+0io 23pf+0w

although the absolute times can't be compared with us due to hardware 
differences, the mmap being nearly twice slower is a shame...