[PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs

Attilio Rao attilio at freebsd.org
Tue Jan 16 22:52:43 UTC 2007


2007/1/16, John Baldwin <jhb at freebsd.org>:
> On Tuesday 16 January 2007 15:36, Attilio Rao wrote:
> > 2007/1/16, John Baldwin <jhb at freebsd.org>:
> > > On Tuesday 16 January 2007 11:51, Attilio Rao wrote:
> > > > 2006/7/28, Attilio Rao <attilio at freebsd.org>:
> > > > >
> > > > > After some thinking, I think it's better using init/fini methods
> > > > > (since they hide the sizeof(struct turnstile) with size parameter).
> > > > >
> > > > > Feedbacks and comments are welcome:
> > > > > http://users.gufi.org/~rookie/works/patches/uma_sync_init.diff
> > > >
> > > > [CC'ed all the interested people]
> > > >
> > > > Even if a long time is passed I did some benchmarks based on ebizzy
> tool.
> > > > This program claims to reproduce a real httpd server behaviour and is
> > > > used into the Linux world for benchmarks, AFAIK.
> > > > I think that results of the comparison on this patch is very
> > > > interesting, and I think it worths a commit :)
> > > > I think that results can be even better on a Xeon machine (I had no
> > > > chance to reproduce this on some of these).
> > > > (Results taken in consideration have been measured after some starts,
> > > > in order to minimize caching differences).
> > > >
> > > > The patch:
> > > > http://users.gufi.org/~rookie/works/patches/ts-sq/ts-sq.diff
> > >
> > > Looks good.  Some minor nits are that in subr_turnstile.c in the comment I
> > > would say "a turnstile is allocated" rather than "a turnstile is got from
> a
> > > specific UMA zone" as it reads a little bit clearer.  Also, I would
> > > say "Allocate a" rather than "Get a" for the two _alloc() functions.
> Also,
> > > why not just use UMA_ALIGN_CACHE and make UMA_ALIGN_CACHE (128 - 1) on
> i386
> > > and amd64 rather than adding a new UMA_ALIGN_SYNC?
> >
> > I was thinking that in this way anyone who wants to replace the
> > syncronizing primitive boundary to an appropriate value can do it.
> > I just used UMA_ALIGN_CACHE as default value beacause I don't know the
> > better boundary (for syncronizing primitives) for other arches.
>
> Is there a good reason to not cache-align synch primitives?  That is, why
> would an arch not use cache-align?  Also, is there a reason to not update
> UMA_ALIGN_CACHE on x86?

Beacause the cache line varies between different CPU models of the
same family. For example, L1, L2, L3 cache lines are 64 bytes wide on
P4 and Xeon.
L1 and L2 caches are 32 bytes wide on the other CPUs (P3, P2, etc.)
and in particular they have nothing to do with trace cache line size
that takes advanteges from this code.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein


More information about the freebsd-arch mailing list