svn commit: r191291 - in head: lib/libthr/thread
libexec/rtld-elf/amd64
libexec/rtld-elf/arm libexec/rtld-elf/i386 libexec/rtld-elf/ia64
libexec/rtld-elf/mips libexec/rtld-elf/powerpc libexec/rtld-...
Robert Watson
rwatson at FreeBSD.org
Sun Apr 19 23:50:23 UTC 2009
On Mon, 20 Apr 2009, Ivan Voras wrote:
> 2009/4/20 Robert Watson <rwatson at freebsd.org>:
>> On Sun, 19 Apr 2009, Robert Watson wrote:
>>
>>> Now that the kernel defines CACHE_LINE_SIZE in machine/param.h, use that
>>> definition in the custom locking code for the run-time linker rather than
>>> local definitions.
>>
>> This actually changes the line size used by the rtld code for pre-pthreads
>> locking for several architectures. I think this is an improvement, but if
>> architecture maintainers could comment on that, that would be helpful.
>
> Will there be infrastructure for creating per-CPU structures or is using
> something like:
>
> int mycounter[MAXCPU] __attribute__ ((aligned(CACHE_LINE_SIZE)));
For now, yes, something along these lines. I have a local prototype I'm using
that has an API something like this:
* // Definitions
* struct foostat *foostatp;
* void *foostat_psp;
*
* // Module load
* if (pcpustat_alloc(&foostat_psp, "foostat", sizeof(struct foostat),
* sizeof(u_long)) != 0)
* panic("foostat_init: pcpustat_alloc failed");
* foostatp = pcpustat_getptr(foostat_psp);
*
* // Use the pointer for a statistic
* foostatp[curcpu].fs_counter1++;
*
* // Retrieve summary statistics and store in a single instance
* struct foostat fs;
* pcpustat_fetch(foostat_psp, &fs);
*
* // Reset summary statistics.
* pcpustat_reset(foostat_psp);
*
* // Module unload
* pcpustat_free(foostat_psp);
* foostatp = foostat_psp = NULL;
The problem with the [curcpu] model is that it embeds the assumption that it's
a good idea to have per-CPU fields in adjacent cache lines within a page. As
the world rocks gently in the direction of NUMA, there's a legitimate question
as to whether that's a good assumption to build in. It's a better assumption
than the assumption that it's a good idea to use a single stat across all CPUs
in a single cache line, of course. Depending on how we feel about the
overhead of accessor interfaces, all this can be hidden easily enough.
A facility I'd like to have would be an API to allocate memory on all CPUs at
once, with the memory on each CPU at a constant offset from a per-CPU base
address. That way you could calculate the location of the per-CPU structure
using PCPU_GET(dynbase) + foostat_offset without adding an additional
indirection.
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the svn-src-head
mailing list