svn commit: r202572 - head/lib/libc/gen

Bruce Evans brde at optusnet.com.au
Thu Jan 21 05:26:01 UTC 2010


On Wed, 20 Jan 2010, Andrey Chernov wrote:

> On Wed, Jan 20, 2010 at 09:33:08PM +1100, Bruce Evans wrote:
>>> But there is
>>> nothing said about opendir() & strcoll() relation in the mentioned
>>> standards. The only word I found is that opendir() returns "ordered"
>>> sequence, but nowhere mentioned ordered by what criteria, so perhaps they
>>> mean "stable":
>>
>> As I said before, sorting in opendir() has nothing to do with POSIX!  It
>> is an implementation detail for union file systems/mounts.
>
> Moreover, even sorting itself is not required here. We sort just to remove
> dups.

Interesting.  Why does it require a stable sort then?  It only removes
duplicates by name.  At least with strcmp() in the compare function, such
dups will remain together although they may be moved.  The stable sort
would be needed if it must keep the original first of duplicates by name,
but it doesn't say that.

BTW, the statfs() to determine if this sort is necessary is a large
pessimization for nfs file systems.  Nfs caches most things but not
statfs().  Thus a readdir() over nfs does an expensive statfs() every
time although the directory contents will normally be cached after the
first time.  I think the sorting belongs in file systems, not in
readdir() where it affects file systems that don't need it.

>> It should also give the FreeBSD
>> extension of POSIX.  POSIX says: "If the strcoll() function fails,
>> then the return value of alphasort() is unspecified.", but this makes
>> alphasort() unusable since a qsort() comparison function must return
>> a specified value.
>
> To be used in practice, strcoll() should never fails, doing fallback to
> strcmp() instead, not only in that, but in lots of other cases too (it may
> set errno like EILSEQ, but not fails). The next important thing is to
> return 0 only for true binary equals, additionaly ranking (f.e. by
> strcmp()) anything inside classes of equality to stabilize result.
>
> I hope our strcoll() will be kept in that state after implementing
> UCA too.

What is UCA?

Failing is a POSIX bug -- C99 doesn't allow it to fail.  I think it
should at least be specified to return nonzero (unequal) on failure.
This is like comparisons of NaNs returning unequal even for comparisons
of identical NaNs.

Can it return equal for non-binary-equal strings?  I think it can -- the
locale might have different encodings for strings that are considered
identical.  Then duplicates should be according to strcoll() and file
systems would have a hard time managing such duplicates when they are
created in a locale where they are non-duplicates.

Bruce


More information about the svn-src-all mailing list