svn commit: r346593 - head/sys/sys
Konstantin Belousov
kostikbel at gmail.com
Thu Apr 25 08:22:33 UTC 2019
On Thu, Apr 25, 2019 at 07:38:21AM +0200, Wojciech Macek wrote:
> Intel does not reorder reads against the condition "if" here. I know for
> sure that ARM does, but therestill might be some other architectures that
> also suffers such behavior - I just don't have any means to verify.
> I remember the discussion for rS302292 where we agreed that this kind of
> patches should be the least impacting in perfomrance as possible. Adding
> unconditional memory barrier causes significant performance drop on Intel,
> where in fact, the issue was never seen.
>
Atomic_thread_fence_acq() is nop on x86, or rather, it is compiler memory
barrier. If you need read/read fence on some architectures, I am sure
that you need compiler barrier on all.
> Wojtek
>
> czw., 25 kwi 2019 o 06:08 Mark Johnston <markj at freebsd.org> napisał(a):
>
> > On Tue, Apr 23, 2019 at 06:36:32AM +0000, Wojciech Macek wrote:
> > > Author: wma
> > > Date: Tue Apr 23 06:36:32 2019
> > > New Revision: 346593
> > > URL: https://svnweb.freebsd.org/changeset/base/346593
> > >
> > > Log:
> > > This patch offers a workaround to buf_ring reordering
> > > visible on armv7 and armv8. Similar issue to rS302292.
> > >
> > > Obtained from: Semihalf
> > > Authored by: Michal Krawczyk <mk at semihalf.com>
> > > Approved by: wma
> > > Differential Revision: https://reviews.freebsd.org/D19932
> > >
> > > Modified:
> > > head/sys/sys/buf_ring.h
> > >
> > > Modified: head/sys/sys/buf_ring.h
> > >
> > ==============================================================================
> > > --- head/sys/sys/buf_ring.h Tue Apr 23 04:06:26 2019 (r346592)
> > > +++ head/sys/sys/buf_ring.h Tue Apr 23 06:36:32 2019 (r346593)
> > > @@ -310,14 +310,23 @@ buf_ring_peek_clear_sc(struct buf_ring *br)
> > > if (!mtx_owned(br->br_lock))
> > > panic("lock not held on single consumer dequeue");
> > > #endif
> > > - /*
> > > - * I believe it is safe to not have a memory barrier
> > > - * here because we control cons and tail is worst case
> > > - * a lagging indicator so we worst case we might
> > > - * return NULL immediately after a buffer has been enqueued
> > > - */
> > > +
> > > if (br->br_cons_head == br->br_prod_tail)
> > > return (NULL);
> > > +
> > > +#if defined(__arm__) || defined(__aarch64__)
> > > + /*
> > > + * The barrier is required there on ARM and ARM64 to ensure, that
> > > + * br->br_ring[br->br_cons_head] will not be fetched before the
> > above
> > > + * condition is checked.
> > > + * Without the barrier, it is possible, that buffer will be fetched
> > > + * before the enqueue will put mbuf into br, then, in the
> > meantime, the
> > > + * enqueue will update the array and the br_prod_tail, and the
> > > + * conditional check will be true, so we will return previously
> > fetched
> > > + * (and invalid) buffer.
> > > + */
> > > + atomic_thread_fence_acq();
> > > +#endif
> >
> > Why is it specific to ARM?
> >
More information about the svn-src-all
mailing list