svn commit: r346593 - head/sys/sys

Thu Apr 25 08:22:33 UTC 2019

On Thu, Apr 25, 2019 at 07:38:21AM +0200, Wojciech Macek wrote:
> Intel does not reorder reads against the condition "if" here. I know for
> sure that ARM does, but therestill might be some other architectures that
> also suffers such behavior - I just don't have any means to verify.
> I remember the discussion for rS302292 where we agreed that this kind of
> patches should be the least impacting in perfomrance as possible. Adding
> unconditional memory barrier causes significant performance drop on Intel,
> where in fact, the issue was never seen.
> 
Atomic_thread_fence_acq() is nop on x86, or rather, it is compiler memory
barrier.  If you need read/read fence on some architectures, I am sure
that you need compiler barrier on all.

> Wojtek
> 
> czw., 25 kwi 2019 o 06:08 Mark Johnston <markj at freebsd.org> napisał(a):
> 
> > On Tue, Apr 23, 2019 at 06:36:32AM +0000, Wojciech Macek wrote:
> > > Author: wma
> > > Date: Tue Apr 23 06:36:32 2019
> > > New Revision: 346593
> > > URL: https://svnweb.freebsd.org/changeset/base/346593
> > >
> > > Log:
> > >   This patch offers a workaround to buf_ring reordering
> > >   visible on armv7 and armv8. Similar issue to rS302292.
> > >
> > >   Obtained from:         Semihalf
> > >   Authored by:           Michal Krawczyk <mk at semihalf.com>
> > >   Approved by:           wma
> > >   Differential Revision: https://reviews.freebsd.org/D19932
> > >
> > > Modified:
> > >   head/sys/sys/buf_ring.h
> > >
> > > Modified: head/sys/sys/buf_ring.h
> > >
> > ==============================================================================
> > > --- head/sys/sys/buf_ring.h   Tue Apr 23 04:06:26 2019        (r346592)
> > > +++ head/sys/sys/buf_ring.h   Tue Apr 23 06:36:32 2019        (r346593)
> > > @@ -310,14 +310,23 @@ buf_ring_peek_clear_sc(struct buf_ring *br)
> > >       if (!mtx_owned(br->br_lock))
> > >               panic("lock not held on single consumer dequeue");
> > >  #endif
> > > -     /*
> > > -      * I believe it is safe to not have a memory barrier
> > > -      * here because we control cons and tail is worst case
> > > -      * a lagging indicator so we worst case we might
> > > -      * return NULL immediately after a buffer has been enqueued
> > > -      */
> > > +
> > >       if (br->br_cons_head == br->br_prod_tail)
> > >               return (NULL);
> > > +
> > > +#if defined(__arm__) || defined(__aarch64__)
> > > +     /*
> > > +      * The barrier is required there on ARM and ARM64 to ensure, that
> > > +      * br->br_ring[br->br_cons_head] will not be fetched before the
> > above
> > > +      * condition is checked.
> > > +      * Without the barrier, it is possible, that buffer will be fetched
> > > +      * before the enqueue will put mbuf into br, then, in the
> > meantime, the
> > > +      * enqueue will update the array and the br_prod_tail, and the
> > > +      * conditional check will be true, so we will return previously
> > fetched
> > > +      * (and invalid) buffer.
> > > +      */
> > > +     atomic_thread_fence_acq();
> > > +#endif
> >
> > Why is it specific to ARM?
> >