bin/106734: [patch] SSE2 optimization for bzip2/libbz2
Julian Seward
jseward at acm.org
Sat Jan 6 21:00:39 PST 2007
The following reply was made to PR bin/106734; it has been noted by GNATS.
From: Julian Seward <jseward at acm.org>
To: Mikhail Teterin <mi at corbulon.video-collage.com>
Cc: bug-followup at freebsd.org
Subject: Re: bin/106734: [patch] SSE2 optimization for bzip2/libbz2
Date: Sun, 7 Jan 2007 05:08:43 +0000
I believe this analysis is correct:
> /* Load the bytes: */
> n1 = (__m128i)_mm_loadu_pd((double *)(block + i1));
> n2 = (__m128i)_mm_loadu_pd((double *)(block + i2));
>
> read beyond the end of the defined area of block. block is
> defined for [0 .. nblock + BZ_N_OVERSHOOT - 1], but I think
> you are doing a SSE load at &block[nblock + BZ_N_OVERSHOOT - 2],
> hence loading 15 bytes of garbage.
Valgrind doesn't complain about the out-of-range access, because you
are still accessing inside a valid malloc-allocated block. But it
does know that the read data is uninitialised, hence it complains
when you do a comparison with that data followed by a conditional
branch (or move) based on the result of the comparison.
> This is possible... You think, the loop should exit earlier and test
> the last (up to) 15 bytes one-by-one?
Certainly the loop-end stuff needs to be fixed up somehow to reflect
the 16 byte loads, but without further investigation I'm not sure how.
More information about the freebsd-bugs
mailing list