SSE vs. stack alignment vs. pthread

Craig Boston craig at
Tue Nov 23 23:02:48 PST 2004

First of all, I'd like to apologize for cross-posting to -hackers and
-threads.  I'm not sure yet if this is an application bug, a gcc
bug, or a pthreads bug, so here goes...

I'm currently working on the audacity port.  It's up to 1.2.3, but I
want to get a problem I've observed with 1.2.2 resolved to make sure
that it doesn't crop up later or affect other software...

Long story short, audacity is a threaded program.  A straight compile of
1.2.2 results in a 100% reproducible bus error that happens on multiple 
Pentium-4 machines (5.3-STABLE).  It always happens at this instruction:

0x081807c4: movaps %xmm0,0xffffff68(%ebp)

Now, at that time ebp is 0xbfadc6c0, so ebp+0xffffff68 (-0x152) is 0xbfadc56e.  
Oops, that's not 16-byte aligned like SSE wants.  The offsets vary sligthly 
depending on the compile flags, etc., but the result is always the same -- 

My first suspicion was compiler bug.  Audacity doesn't inline any SSE code 
itself -- the movaps is being generated by gcc as part of the pentium4 
optimizations.  There are two factors that are a little suspicious, though.

1) When I switch out libpthread for libc_r, the crash goes away.  
Unfortunately, the gdb in 5.3 seems to have forgotten how to debug libc_r 
based programs so I can't really tell what is different in that case.  I just 
get "Cannot find thread 2: Thread ID=1, generic error".

2) Some searching turned up several similar problems on Linux and NetBSD.  The 
NetBSD post here 
[] indicates that 
it may be related to stack alignment in the thread library.  I'm not sure if 
the ABI requirement discussed there is NetBSD and/or amd64 specific though.

HOWEVER -- I inserted some debugging printfs into libpthread to test this 
theory.  The stack it allocates for that thread is located at 0xbfaad000, 
which is not only 16-byte aligned but page aligned...  So I'm reluctant to 
blame libpthread as it seems to be doing everything right and even going the 
extra mile.  I honestly don't know whether gcc is expecting the alignment to 
compensate for the return address push or the function prolog, or if it's 
just losing track of where the stack should be somewhere.  I may be 
over-analyzing the problem at that point :)

Another factor to consider is that nobody has reported similar problems in 
other software...  I've been trying to create a simple test case, however 
it's proving quite difficult to coax gcc into generating SSE code on its own 
where I want it.  It's of course possible that Audacity itself is doing 
something weird to cause it, but I haven't been able to find anything 
suspicious or low-level enough to affect the stack alignment.

It could just be a heisenbug, and libc_r is different enough to mask the 
problem.  Any and all suggestions from threads/compiler gurus would be very 
much appreciated.  I'm about ready to throw in the towel and just force 
"-mno-sse -mno-sse2" compiler flags in the makefile...


More information about the freebsd-hackers mailing list