SSE vs. stack alignment vs. pthread
craig at tobuj.gank.org
Tue Nov 23 23:02:48 PST 2004
First of all, I'd like to apologize for cross-posting to -hackers and
-threads. I'm not sure yet if this is an application bug, a gcc
bug, or a pthreads bug, so here goes...
I'm currently working on the audacity port. It's up to 1.2.3, but I
want to get a problem I've observed with 1.2.2 resolved to make sure
that it doesn't crop up later or affect other software...
Long story short, audacity is a threaded program. A straight compile of
1.2.2 results in a 100% reproducible bus error that happens on multiple
Pentium-4 machines (5.3-STABLE). It always happens at this instruction:
0x081807c4: movaps %xmm0,0xffffff68(%ebp)
Now, at that time ebp is 0xbfadc6c0, so ebp+0xffffff68 (-0x152) is 0xbfadc56e.
Oops, that's not 16-byte aligned like SSE wants. The offsets vary sligthly
depending on the compile flags, etc., but the result is always the same --
My first suspicion was compiler bug. Audacity doesn't inline any SSE code
itself -- the movaps is being generated by gcc as part of the pentium4
optimizations. There are two factors that are a little suspicious, though.
1) When I switch out libpthread for libc_r, the crash goes away.
Unfortunately, the gdb in 5.3 seems to have forgotten how to debug libc_r
based programs so I can't really tell what is different in that case. I just
get "Cannot find thread 2: Thread ID=1, generic error".
2) Some searching turned up several similar problems on Linux and NetBSD. The
NetBSD post here
[http://mail-index.netbsd.org/port-amd64/2004/02/27/0001.html] indicates that
it may be related to stack alignment in the thread library. I'm not sure if
the ABI requirement discussed there is NetBSD and/or amd64 specific though.
HOWEVER -- I inserted some debugging printfs into libpthread to test this
theory. The stack it allocates for that thread is located at 0xbfaad000,
which is not only 16-byte aligned but page aligned... So I'm reluctant to
blame libpthread as it seems to be doing everything right and even going the
extra mile. I honestly don't know whether gcc is expecting the alignment to
compensate for the return address push or the function prolog, or if it's
just losing track of where the stack should be somewhere. I may be
over-analyzing the problem at that point :)
Another factor to consider is that nobody has reported similar problems in
other software... I've been trying to create a simple test case, however
it's proving quite difficult to coax gcc into generating SSE code on its own
where I want it. It's of course possible that Audacity itself is doing
something weird to cause it, but I haven't been able to find anything
suspicious or low-level enough to affect the stack alignment.
It could just be a heisenbug, and libc_r is different enough to mask the
problem. Any and all suggestions from threads/compiler gurus would be very
much appreciated. I'm about ready to throw in the towel and just force
"-mno-sse -mno-sse2" compiler flags in the makefile...
More information about the freebsd-hackers