svn commit: r352938 - head/sys/arm/include

Wed Oct 2 15:03:30 UTC 2019

On Tue, Oct 01, 2019 at 01:53:05PM -0600, Ian Lepore wrote:
> On Tue, 2019-10-01 at 22:49 +0300, Konstantin Belousov wrote:
> > On Tue, Oct 01, 2019 at 07:39:00PM +0000, Ian Lepore wrote:
> > > Author: ian
> > > Date: Tue Oct  1 19:39:00 2019
> > > New Revision: 352938
> > > URL: https://svnweb.freebsd.org/changeset/base/352938
> > > 
> > > Log:
> > >   Add 8 and 16 bit versions of atomic_cmpset and atomic_fcmpset for arm.
> > >   
> > >   This adds 8 and 16 bit versions of the cmpset and fcmpset functions. Macros
> > >   are used to generate all the flavors from the same set of instructions; the
> > >   macro expansion handles the couple minor differences between each size
> > >   variation (generating ldrexb/ldrexh/ldrex for 8/16/32, etc).
> > >   
> > >   In addition to handling new sizes, the instruction sequences used for cmpset
> > >   and fcmpset are rewritten to be a bit shorter/faster, and the new sequence
> > >   will not return false when *dst==*old but the store-exclusive fails because
> > >   of concurrent writers. Instead, it just loops like ldrex/strex sequences
> > >   normally do until it gets a non-conflicted store. The manpage allows LL/SC
> > >   architectures to bogusly return false, but there's no reason to actually do
> > >   so, at least on arm.
> > 
> > The reason is to avoid nested loops.  The outer control for retry was the
> > initial design decision for fcmpset() comparing to cmpset().  casueword()
> > also started following this approach after the fixes for ll/sc looping
> > after the external control.
> 
> If the implementation is forbidden from looping, then the manpage
> should say so.  What I commited meets the requirements currently stated
> in the manpage.  Until somebody explains to me why it is somehow
> harmful to return the RIGHT information at a cost of either 0 or 1
> extra cpu cycle, it's staying the way it is.

Implementation is not forbidden from looping, but looping definitely
deteriorates the quality of the implementation.

Problem with the loop is that the outer caller does not have control
over the inner loop, while it is the outer caller who knows more about
terminating conditions.

I can only point out casueword(9) where inner looping is causing CVE-level
issues.  I do not believe that any of atomic(9) primitives on ll/sc machines
cause the CVE for us right now (but casueword did).

Note that even x86 does not use the comparision with dest, but rely on
the CPU flag to provide the result (this is the closest analog of not
looping for CAS).