svn commit: r251782 - head/sys/sparc64/sparc64

Sun Jun 16 15:06:28 UTC 2013

On Sun, Jun 16, 2013 at 01:49:09PM +0200, Ed Schouten wrote:
> Hi Marius,
> 
> 2013/6/15 Marius Strobl <marius at alchemy.franken.de>:
> > Semantically, this change is wrong; what we really need here is an
> > acquire variant. Using the release variant instead happens to also
> > work - albeit additionally wastes CPU cycles for the write memory
> > barrier - because in total store order, atomic operations implicitly
> > include the read memory barrier necessary for acquire semantics. In
> > other words, atomic(9) is in dare need of atomic_store_acq_<type>().
> 
> I personally dislike the idea of extending the existing atomic(9) API.
> My long-term goal would be that we could just use C11 atomics instead
> of using our own home-grown API. If we can't express this using the
> atomic(9) API, I'd just like us to use <stdatomic.h> instead.
> 
> Reading up on the C11 standard (section 5.1.2.4), it seems that the
> abstract model of threads described does not allow stores to be a
> acquire operations. This does make sense, though. A load can of course
> not be a release operation. Because releases synchronize with
> acquires, a store being an acquire operation would have nothing to
> synchronize with.
> 
> So I guess in this case we should solve it by using a relaxed store,
> followed by an acquire fence:
> 
> http://80386.nl/pub/sparc64-atomic.txt
> 
> Would that work for you?
> 

Generally, I dislike the concept of passing control of how atomics
are implemented over to the compiler, at least as far as the kernel
is concerned and especially since the actual code generated for them
depends on the flavour and version of the compiler used. It's also
unclear to me how the C11 memory orders relate to SPARC v9 memory
models. We run the kernel and all of userland in total store order
regardless of the memory model denoted in the ELF header (which at
least GCC 4.2.1 uses relaxed memory ordering for). Partially that
is because - contrary to what one might expect - it turned out that
employing total store order and its implicit memory barriers per-
forms considerably better than running things with relaxed memory
order and inserting explicit memory barriers as needed.
With GCC 4.2.1 and your change to kern_event.c to use C11 atomics
what happens is that the compiler includes memory barriers in a 
sledgehammer-fashion, though.
In the particular case of the pmap stuff, parts of the atomic
accesses to it are implemented in swtch.S. Currently, these are in
sync with the code generated when using atomic(9) in C. Also,
modulo r251782, the model employed as a whole, i. e. within C and
assembler code, for accessing these pmap bits matches as far as
acquire and release semantics are concerned. I really want to
preserve this determinism and not make the actual code and
semantics generated for the C half of it subject to the compiler
of the day. So, in order to go forward and if atomic_{load,store}()
conflicts with <stdatomic.h>, what I like to do is to introduce
an MD atomic_store_acq_ptr() and just live with that.

Marius