/usr/bin/sort may be incorrect

Marius Strobl marius at freebsd.org
Mon Apr 11 21:57:39 UTC 2016


On Thu, Mar 31, 2016 at 08:22:25PM +0900, Shigeharu TAKENO wrote:
> shige 03/31 2016
> ----------------
> 
> Thank you for your reply.
> 
> Joerg Wunsch wrote:
> 
> | struct key_value
> | {
> |    struct bwstring *k;
> |    struct key_hint hint[];
> | };
> | 
> | If that works for you, too, I think it would be the preferrable way to
> | write it.
> 
> Unfortunately this does not fix the problem.
> 
> 
> | > The k field of key_value may be overwritten by the hint field
> | > in numcoll_impl(), gnumcoll() and monthcoll() (coll.c), and the
> | > pointer value of k may change to incorrect value.
> | 
> | Are you saying that something like
> | 
> | struct key_value *kw;
> | 
> | ...
> | 
> |    kw->hint[-1] = something;
> | 
> | happens?  That would certainly be a bug in the code then that ought to
> | be fixed, rather than worked around.
> 
> I tested under your suggestion "struct key_hint hint[]", which 
> behaves as the same of default sort command.
> 
> % ( echo 2 5 8 ; echo 2 6 5 ) | sort -n +0 -1 +1 -2 +2 -3
> 
> 
> In key_coll(struct keys_array *ps1, struct keys_array *ps2, 
>   size_t offset) (in coll.c), initial pointer values are the
> followings:
> 
>  &(ps1->key[0]) = 0x40c140f8
>  &(ps1->key[1]) = 0x40c14100
>  &(ps1->key[2]) = 0x40c14108
>  &(ps2->key[0]) = 0x40c14088
>  &(ps2->key[1]) = 0x40c14090
>  &(ps2->key[2]) = 0x40c14198
>  (the pointer repeat is only 8 byte.)
> 
>  ps1->key[0].k = 0x40c060e0
>  ps1->key[1].k = 0x40c060f0
>  ps1->key[2].k = 0x40c06100
>  ps2->key[0].k = 0x40c060a0
>  ps2->key[1].k = 0x40c060b0
>  ps2->key[2].k = 0x40c060c0
> 
> key_coll() calls sm->func() = numcoll(), and it uses
> numcoll_impl(struct key_value *kv1, struct key_value *kv2) with
> ps1->key[i] and ps2->key[i]. The function numcoll_impl() uses k
> field and hint field of struct key_value.
> 
> 
> For i = 0, the k field pointers of arguments kv1 and kv2 of 
> numcoll_impl() are correct:
> 
>  kv1->k = 0x40c060e0, kv2->k = 0x40c060a0
> 
> but the hint field pointers of kv1, kv2 are doughtful:
> 
>  &(kv1->hint) = 0x40c14100, &(kv2->hint) = 0x40c14090
> 
> which are the same value of &(ps1->key[1]) and &(ps2->key[1]).
> 
> 
> And for i = 1, the k field pointers of arguments kv1 and kv2 
> become incorrect:
> 
>  kv1->k = 0x140c060f0, kv2->k = 0x140c060b0
> 
> which are added 0x100000000 to the original pointer value. 
> The sort command stops where it uses the value.
> 
> 
> If we use the definition "struct key_hint hint[1]", the repeat
> of pointers of ps1->key[i] becomes 32 byte, and incorrect changes 
> of pointers do not occur.
> 
>   &(ps1->key[0]) = 0x40c08208
>   &(ps1->key[1]) = 0x40c08228
>   &(ps1->key[2]) = 0x40c08248
> 

AFAICT is sort(1) relying on undefined behavior. It builds up an
array of struct keys_array, which in turn has a zero length array
(apparently unnecessary) of struct key_value, which in turn has
a zero length array of struct key_hint. While sort(1) takes care
to allocate space for the latter based on need_hint, it relies
on the compiler to determine the correct offsets of the struct
key_value elements within the array of struct keys_array. Given
that the compiler isn't aware of the actual size - for essentially
the same reason zero length arrays also don't contribute to the
sizeof() of such a struct - of struct key_value, that doesn't
work, i. e. struct key_hint isn't accounted for in the offset
calculation leading to the corruption described above.
I'm not exactly sure why this bug apparently doesn't affect x86
but I think that's due to its little-endian addressing avoiding
corruption in practice; that's somewhat hard to think through
given that struct bwstring and the use thereof are a similarly
... interesting constructs.
Anyway, using an accessor function for determining the correct
offsets of struct key_value elements within the array of struct
keys_array as in the attached patch fixes the problem for me.
However, I suspect that the space savings by only allocating
memory for struct key_hint when needed don't really justify
all the hackery involved in that approach. At least, changing
struct key_hint to be a fixed member of struct key_value and
removing all parts hanging off from need_hint would make the
code a lot cleaner and readable.

Gabor, what do you think?

Marius

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sort.diff
Type: text/x-diff
Size: 2985 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-sparc64/attachments/20160411/6e7ed4ba/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 949 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-sparc64/attachments/20160411/6e7ed4ba/attachment.sig>


More information about the freebsd-sparc64 mailing list