'make -j16 universe' gives SIReset

Marius Strobl marius at alchemy.franken.de
Fri Sep 2 15:32:12 UTC 2011


On Thu, Sep 01, 2011 at 07:24:58AM +1000, Peter Jeremy wrote:
> On 2011-Aug-30 17:27:25 +0200, Marius Strobl <marius at alchemy.franken.de> wrote:
> >Regarding the problem with the userland mutex code could you please try
> >whether the following patch makes a difference? Given that the previous
> >version of the above one as a side-effect made that problem harder to
> >trigger it's probably a good idea to test the second patch separately.
> >http://people.freebsd.org/~marius/sparc64_casuword_membar.diff
> 
> As far as I can tell, that applies to /usr/src/sys/sparc64/sparc64/support.S
> and only affects the kernel.  If so, then with it applied, the stress
> tests ran for about 5?hrs and then a thr1 process got wedged in urdlck.
> 

Yes, the patch is a workaround for the fact that there are no acquire
and release versions of casuword*(9), which seems to be part of the
problem given that adding a memory barrier makes it considerably harder
to trigger the problem and given that the code appears to otherwise
work fine on at least x86 (where unlike sparc64 memory barriers are
seldom necessary). The userland part of the userland mutex code already
uses the acquire and release variants of atomic operations.
Unfortunately, that code is rather complex, at least for me, so it's
hard to judge whether parts that look suspicious are actually okay or
are subject to races.

Marius



More information about the freebsd-sparc64 mailing list