cmp(1) has a bottleneck, but where?

Bruce Evans brde at optusnet.com.au
Mon Jan 16 01:50:59 UTC 2012


On Sun, 15 Jan 2012, Dieter BSD wrote:

>> posix_fadvise() should probably be used for large files to tell the
>> system not to cache the data. Its man page reminded me of the O_DIRECT
>> flag. Certainly if the combined size exceeds the size of main memory,
>> O_DIRECT would be good (even for benchmarks that cmp the same files :-).
>> But cmp and cp are too old to use it.
>
> 8.2 says:
> man -k posix_fadvise
> posix_fadvise: nothing appropriate
>
> The FreeBSD man pages web page says it is not in 9.0 either.
>
> google found:
> http://lists.freebsd.org/pipermail/freebsd-hackers/2011-May/035333.html
>
> So what is this posix_fadvise() man page you mention?

Standard in 10.0-current.  Not that I normally run that.  I thought I
remembered an older feature that gave this, and didn't notice that
the man page was so new.  Now I remember that the older feature is
madvise(), which is spelled posix_madvise() in POSIX-speak.  So
mmap() may be good for large files after all, but only with use of
madvise() for large files and complications to determine what is a
large file.

Recent mail about this was whether to the primary syscall for the new
API should be spelled correctly (as fadvise(), corresponding to
madvise()).  Currently, there is only the verbose() posix_fadvise().
The options for posix_fadvise() are a large subset of the ones for
madvise(), but spelled with F instead of M and a verbose POSIX prefix
(e.g., MADV_NORMAL for madavise() and even for posix_madvise() becomes
POSIX_FADV_NORMAL for posix_fadvise()).

> O_DIRECT looked interesting, but I haven't found an explaination of
> exactly what it does, and
> find /usr/src/sys | xargs grep O_DIRECT | wc -l
> 188
> was a bit much to wade through, so I didn't try O_DIRECT.

I have no experience using it, but think it is safe to try to see if
it helps.

>> I think memcmp() instead of byte comparision for cmp -lx is not very
>> complex. More interesting is memcmp() for the general case. For
>> small files (<= mmap()ed size), mmap() followed by memcmp(), then
>> go back to a byte comp to count the line number when memcmp() fails
>> seems good. Going back is messier and slower for large files. In
>> the worst case of files larger than memory with a difference at the
>> end, it involves reading everything twice, so it is twice as slow
>> if it is i/o bound.
>
> Studying the cmp man page, it is... unfortunate. The default
> prints the byte and line number if the files differ, so it needs
> that info. The -l and -x options just keep going after the first
> difference. If you want the first byte to be indexed 0 or 1 you can't
> choose the radix independantly.
>
> If we only needed the byte count it wouldn't be so bad, but needing
> the line count really throws a wrench in the works if we want to use
> memcpy(). The only way to avoid needing the line count is -s.

-l or -x also.  The FreeBSD man page isn't clear about when the line
number is printed.  It doesn't say that -l and -x cancel the general
requirement of printing the line number, but they do in practice.
POSIX doesn't have -x, at least in 2001, but it gives the precise
format for -l and there is no line number in it.

Maybe line counting is supposed to be pessimized further by supporting
wide characters.  wc is already fully pessimized for this, but it has
a not-quite-so-slow mode in which it doesn't call mbrtowc() and checks
for '\n' instead of L\'n'.  It also has an extremely fast mode for
wc -c and wc -m, in which for regular files, it just stats the file.

This is another indication that cmp is completely unsuitable for
comparing files for equality.  I couldn't find where POSIX says that
either wc or cmp must support wide characters or multi-byte characters,
but for cmp it says that if the file is not a text file then the line
count is simply the number of <newline> characters.  Clearly non-text
files consist of just bytes, so the <newline>s in them must be simply
'\n' characters which we don't want to count anyway.

Bruce


More information about the freebsd-performance mailing list