svn commit: r204803 - head/usr.bin/uniq

Andrey Chernov ache at nagual.pp.ru
Tue Mar 9 19:33:43 UTC 2010


On Tue, Mar 09, 2010 at 12:55:44PM -0500, David Schultz wrote:
> Actually, a question...why doesn't it suffice to simply call
> strcoll() instead of mbstowcs() followed by wcscoll()?
> I would expect that in the absence of the -i flag, none of
> this would be necessary.  

strcoll() is only for single-byte characters locale. It means no UTF-8 
f.e. To do what you assume (without coverting to wide chars), we'll need 
fast mbscoll() function (see our join.c for its slow emulation using 
wide chars).

> At the very least, it would make
> sense to start with a strcmp(), and only fall back on the
> expensive conversion and collation if the strings don't
> compare equal.

As I notice, files feeded to uniq commonly have only few equal lines and 
much more unequal ones, so strcmp() will be additional overkill most of 
the time.

-- 
http://ache.pp.ru/


More information about the svn-src-head mailing list