svn commit: r341682 - head/sys/sys

Tue Dec 11 20:47:16 UTC 2018

On Mon, Dec 10, 2018 at 09:57:08PM -0700, Scott Long wrote:
> 
> 
> > On Dec 10, 2018, at 4:47 PM, Konstantin Belousov <kostikbel at gmail.com> wrote:
> > 
> > On Mon, Dec 10, 2018 at 02:15:20PM -0800, John Baldwin wrote:
> >> On 12/8/18 7:43 PM, Warner Losh wrote:
> >>> 
> >>> 
> >>> On Sat, Dec 8, 2018, 8:36 PM Kevin Bowling <kevin.bowling at kev009.com <mailto:kevin.bowling at kev009.com> wrote:
> >>> 
> >>>    On Sat, Dec 8, 2018 at 12:09 AM Mateusz Guzik <mjguzik at gmail.com <mailto:mjguzik at gmail.com>> wrote:
> >>> 
> >>>> 
> >>>> Fully satisfying solution would be that all architectures get 64-bit
> >>>> ops, even if in the worst case they end up taking a lock. Then
> >>>> subsystems would not have to ifdef on anything. However, there
> >>>> was some opposition to this proposal and I don't think this is
> >>>> important enough to push.
> >>> 
> >>>    Mateusz,
> >>> 
> >>>    Who is opposing this particular polyfill solution?  Scott Long brought
> >>>    up a situation in driver development where this would be useful as
> >>>    well.  The polyfills lower the cognitive load and #ifdef soup which
> >>>    are the right call here regardless of performance on toy ports.
> >>> 
> >>> 
> >>> I don't recall seeing the opposition either. It would have to be a global lock for all 64bit atomics.... but I think it would only be 2 atomics on those architectures. 
> >> 
> >> It would have to be a spin lock, so in the case of unrl you would be trading
> >> an operation on one of N regular mutexes for a single spin lock that was
> >> also contested by other things.  This would be pretty crappy.  For drivers
> >> that aren't actually used on platforms without 32-bit atomics we can simply
> >> not build them in sys/modules/Makefile or not put them in GENERIC.  For
> >> something in the core kernel like unrl I think we will have to do what
> >> Mateusz has done here.
> > 
> > It is worse. All atomics that acess the same location must use the same
> > lock. Otherwise, you could observe torn writes and out of thin air
> > values. Since you cannot know in advance which locations are acceses
> > by the locked variant, all freebsd atomics ops have to be switched to
> > locked variant on the architecture.
> 
> 64bit atomics on I486 already suffer the risk of torn reads; the implementation
> merely does a CLI to protect against local preemption (though you could still
> get unlucky with an NMI).  I suppose you could argue that SMP isn’t really
> viable on I486 and therefore this fact is irrelevant, but it does illustrate
> precedence for having API completeness in a platform.
64bit atomics on 486 are fine, because we only support SMP on machines
which have cmpxchg8b.  Even then, I am not sure that we really support
the kind of SMP configurations from the Pentium times, at least I am certain
that this was not exercised for quite long time.

> 
> Really, this isn’t that hard.  Part of the existing contract of using atomics is
> that you carefully evaluate all uses of the variable and decide when to use
> an atomic instruction.  Arguing that we can’t make this process automatic
> and foolproof for 64bit quantities, especially for a subset of subset of
> platforms/architectures, and therefore we should be even more of a difficult
> landmine, is not…. I don’t know what to say… sensical?

> 
> 64bit operations are a reality for MI code in a modern OS, and I’m tired of
> having to tip-toe around them due to incomplete MD implementations.  The
> instructions have been available on Intel CPUs for 25 years!  My
> very strong preference is to have a complete and functional implementation
> of atomic.h for any architecture that is hooked up to the build.  We can then
> tackle the details of optimization and edge case refinement, just like we do
> with every other API and service that we work on.  It doesn’t have to be
> perfect to be useful, and at this point we’re providing neither perfection nor
> utility, just “buts” and “what ifs”.
I do not understand this rant. Provide working implementation for 64bit
atomics on the arches which lack them, everybody will be happy.

My point is that implementing e.g. only atomic_add_64() using lock
is not a solution. Exactly because it makes inconsistent KPI which
does not satisfy basic guarantees which are provided on other arches,
heavily relied upon in FreeBSD code, and documented in atomic(9) in
the free prose, with further references to C11. I.e. instead of the
cross-arch KPI such implementation would require consumers to know arch
pecularities.  Isn't this complete failure of the goals ?

BTW, this is why C11 standard provides lockless predicate for atomic
types and not for atomic ops.  If one atomic op is locked, all of them
must be.

> 
> Going forward, I’m going to start using 64bit atomics where they’re prudent,
> instead of avoiding them due to this niche 32bit argument.  If that means
> more and more of what I do no longer compiles on a mips or a ppc32, then
> that’s a sacrifice that is fine with me.  It still creates extra development work,
> and having a uniformly available implementation would be much nicer.
> 
> Scott
>