bin/106734: [patch] SSE2 optimization for bzip2/libbz2

Sat Jan 6 21:00:39 PST 2007

The following reply was made to PR bin/106734; it has been noted by GNATS.

From: Julian Seward <jseward at acm.org>
To: Mikhail Teterin <mi at corbulon.video-collage.com>
Cc: bug-followup at freebsd.org
Subject: Re: bin/106734: [patch] SSE2 optimization for bzip2/libbz2
Date: Sun, 7 Jan 2007 05:08:43 +0000

 I believe this analysis is correct:

 >         /* Load the bytes: */
 >         n1 = (__m128i)_mm_loadu_pd((double *)(block + i1));
 >         n2 = (__m128i)_mm_loadu_pd((double *)(block + i2));
 > 
 > read beyond the end of the defined area of block.  block is
 > defined for [0 .. nblock + BZ_N_OVERSHOOT - 1], but I think
 > you are doing a SSE load at &block[nblock + BZ_N_OVERSHOOT - 2],
 > hence loading 15 bytes of garbage.

 Valgrind doesn't complain about the out-of-range access, because you
 are still accessing inside a valid malloc-allocated block.  But it
 does know that the read data is uninitialised, hence it complains
 when you do a comparison with that data followed by a conditional
 branch (or move) based on the result of the comparison.

 > This is possible... You think, the loop should exit earlier and test
 > the last (up to) 15 bytes one-by-one?

 Certainly the loop-end stuff needs to be fixed up somehow to reflect
 the 16 byte loads, but without further investigation I'm not sure how.