svn commit: r341682 - head/sys/sys

Tue Dec 11 04:57:14 UTC 2018

> On Dec 10, 2018, at 4:47 PM, Konstantin Belousov <kostikbel at gmail.com> wrote:
> 
> On Mon, Dec 10, 2018 at 02:15:20PM -0800, John Baldwin wrote:
>> On 12/8/18 7:43 PM, Warner Losh wrote:
>>> 
>>> 
>>> On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling <kevin.bowling at kev009.com <mailto:kevin.bowling at kev009.com> wrote:
>>> 
>>>    On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik <mjguzik at gmail.com <mailto:mjguzik at gmail.com>> wrote:
>>> 
>>>> 
>>>> Fully satisfying solution would be that all architectures get 64-bit
>>>> ops, even if in the worst case they end up taking a lock. Then
>>>> subsystems would not have to ifdef on anything. However, there
>>>> was some opposition to this proposal and I don't think this is
>>>> important enough to push.
>>> 
>>>    Mateusz,
>>> 
>>>    Who is opposing this particular polyfill solution?  Scott Long brought
>>>    up a situation in driver development where this would be useful as
>>>    well.  The polyfills lower the cognitive load and #ifdef soup which
>>>    are the right call here regardless of performance on toy ports.
>>> 
>>> 
>>> I don't recall seeing the opposition either. It would have to be a global lock for all 64bit atomics.... but I think it would only be 2 atomics on those architectures. 
>> 
>> It would have to be a spin lock, so in the case of unrl you would be trading
>> an operation on one of N regular mutexes for a single spin lock that was
>> also contested by other things.  This would be pretty crappy.  For drivers
>> that aren't actually used on platforms without 32-bit atomics we can simply
>> not build them in sys/modules/Makefile or not put them in GENERIC.  For
>> something in the core kernel like unrl I think we will have to do what
>> Mateusz has done here.
> 
> It is worse. All atomics that acess the same location must use the same
> lock. Otherwise, you could observe torn writes and out of thin air
> values. Since you cannot know in advance which locations are acceses
> by the locked variant, all freebsd atomics ops have to be switched to
> locked variant on the architecture.

64bit atomics on I486 already suffer the risk of torn reads; the implementation
merely does a CLI to protect against local preemption (though you could still
get unlucky with an NMI).  I suppose you could argue that SMP isn’t really
viable on I486 and therefore this fact is irrelevant, but it does illustrate
precedence for having API completeness in a platform.

Really, this isn’t that hard.  Part of the existing contract of using atomics is
that you carefully evaluate all uses of the variable and decide when to use
an atomic instruction.  Arguing that we can’t make this process automatic
and foolproof for 64bit quantities, especially for a subset of subset of
platforms/architectures, and therefore we should be even more of a difficult
landmine, is not…. I don’t know what to say… sensical?

64bit operations are a reality for MI code in a modern OS, and I’m tired of
having to tip-toe around them due to incomplete MD implementations.  The
instructions have been available on Intel CPUs for 25 years!  My
very strong preference is to have a complete and functional implementation
of atomic.h for any architecture that is hooked up to the build.  We can then
tackle the details of optimization and edge case refinement, just like we do
with every other API and service that we work on.  It doesn’t have to be
perfect to be useful, and at this point we’re providing neither perfection nor
utility, just “buts” and “what ifs”.

Going forward, I’m going to start using 64bit atomics where they’re prudent,
instead of avoiding them due to this niche 32bit argument.  If that means
more and more of what I do no longer compiles on a mips or a ppc32, then
that’s a sacrifice that is fine with me.  It still creates extra development work,
and having a uniformly available implementation would be much nicer.

Scott