svn commit: r184759 - user/kmacy/HEAD_fast_multi_xmit/sys/net

Wed Nov 12 05:12:07 PST 2008

On Monday 10 November 2008 05:40:41 pm Attilio Rao wrote:
> 2008/11/10, John Baldwin <jhb at freebsd.org>:
> > On Saturday 08 November 2008 11:53:53 am Attilio Rao wrote:
> >  > 2008/11/8, David Schultz <das at freebsd.org>:
> >  > > On Sat, Nov 08, 2008, Attilio Rao wrote:
> >  > >  > Definitively, I'm not sure we need this.
> >  > >  > We alredy have memory barriers you could exploit which just 
require a
> >  > >  > "dummy" object.
> >  > >  >
> >  > >  > For example you could do:
> >  > >  > flowtable_pcpu_unlock(struct flowtable *table, uint32_t hash)
> >  > >  >  {
> >  > >  >
> >  > >  >         (void)atomic_load_acq_ptr(&dummy);
> >  > >  >         ...
> >  > >
> >  > >
> >  > > Memory barriers are cheaper than atomic ops.
> >  >
> >  > But this is an atomic op too.
> >  >
> >  > >  Furthermore, there's different types of memory barriers
> >  > >  (store/store, load/store, etc.), not just a generic mb().  Some
> >  > >  architectures like sparc64 define all four, but only actually
> >  > >  implement the varieties that are useful in improving performance.
> >  > >  Take a look at what Solaris has here.
> >  > >
> >  > >  I'm skeptical of trying to play clever tricks with these things
> >  > >  outside of the code that implements synchronization
> >  > >  primitives. Memory ordering is very hard to reason about, and we
> >  > >  already have a lot of code, e.g., in libthr, that isn't correct
> >  > >  under weak memory ordering. Moreover, the compiler can reorder
> >  > >  loads and stores, and that just adds a whole new level of pain.
> >  >
> >  > _acq prefix is intended to not let reordering happening really.
> >  > man 9 atomic can explain how the acq and rel memory barriers work.
> >
> >
> > _acq is not a full barrier, it's more of an 'lfence'.  The mb() here is 
doing
> >  more of a _rel barrier ('sfence', etc.).
> 
> Sure but the comment is still valid.
> I don't see the point of such things when you can implement barriers
> trough our atomic_* stuff.

atomic_* stuff works when you are already doing a store.  Doing a "dummy" 
store is quite a hack just to get a standalone memory barrier.  There is a 
reason ia64 includes "acq" and "rel" variants of various instructions as well 
as a standalone "fence".   One problem with Kip's change though is that it 
doesn't work on older x86 CPUs that don't have "sfence" (pre-PIII IIRC).  I'm 
not sure if some of the lower-power x86 CPUs such as VIA, etc. 
support "sfence" either, though I think those are typically used in 
single-CPU setups.

-- 
John Baldwin