Determining CPU features / cache organization from userland

Bruce M Simpson bms at spc.org
Sun Oct 12 12:58:03 PDT 2003


All,

I came up with the attached text file today to summarize some of my
findings, after looking at various open source trees to see how they
handle run-time cache geometry detection.

Many will find it ironic that i386 is the easiest platform to deal with.

[ Andrew: Perhaps you can shed some light on how the necessary information
can be gathered on Alpha? My search was incomplete and I could not find
a reliable source for DEC's development manuals. ]

Jeff Roberson suggested I adopt NetBSD's API, however, on further
examination it's clear that NetBSD's approach isn't consistent across
all platforms. Darwin takes a similar approach, but it is perhaps too
PowerPC-centric.

sysctl is a good interface for retrieving this information as it doesn't
change during the lifetime of the kernel, and it is small. sysctl is already
invoked from within libc to retrieve information in this way.

glibc's approach to dealing with situations where knowledge of the cache
line size is needed is a bit fractious - it retrieves the information from
an 'aux vector' passed to glibc at startup.

I think threading libraries should seriously consider becoming consumers of
the API once it's finalized. Mutex alignment on cache line boundaries is
desirable for userland applications too. However, phk malloc would need to
be changed in order to support this specific form of aligned allocation.

Perhaps a separate pool or zone could be used for this kind of allocation?
This becomes more important and timely when one considers the I/O alignment
restrictions we've encountered. Some applications may need to align their
buffers on arbitrary boundaries to suit devices, too.

BMS
-------------- next part --------------

all
---
NetBSD cache information API(s) are not consistent across platforms.

alpha
-----
Cache discovery? Static.
21064, 21064A, 21066, 21066A, 21164 all have line sizes of 32-bytes.
The 21264 has a 64-byte line size.
21364: L1 split, 64KB each, 2-way set-associative, 
Virtual caches can be implemented using PALcode, but this is
probably more of a curiosity than anything else.

ia64
----
Cache discovery? Call PAL_CACHE_INFO, I think.
No documentation on how to do this at this time.
I have emailed marcel at freebsd.org asking for advice.

i386 pc98 amd64
---------------
Cache discovery? CPUID.
Earlier chips which don't support it probably don't have a cache,
or aren't worth supporting.

General rule for x86: split L1, unified L2, optional unified L3.
General rule for Intel P5: 2-way, 32 bytes/line
General rule for Intel MMX and up: 4-way, 32 bytes/line
PPro doesn't have L3.
The newer cores have different cache geometry.

powerpc
-------
Cache line discovery? Static.
Many core variants.
I have not seen any runtime code for this.
The POWER clcs instruction is obsolete.

OpenDarwin assumes 32-bytes. It has hooks for discovering the
cache geometry at runtime but these are not used.

NetBSD statically initializes this information according to the
discovered CPU model in use, which is the way to go.
NetBSD tells uvm to recolor the page queues if required.

Linux uses static #define's from IBM people, except in the case
of ppc64, which is strikingly similar to the OpenDarwin code
except it actually talks to the open firmware.

Open Firmware on CHRP should however provide the following
for each cpu device node configured in the system:
i-cache-size i-cache-sets i-cache-block-size
d-cache-size d-cache-sets d-cache-block-size
tlb-size tlb-sets l2-cache
All are integers except for l2-cache which is the address of an l2-cache
device node if the system found one.

mips
----
The NetBSD MIPS code for dealing with cache geometry
was recently updated.
MIPS caches may be split/unified at L1/L2 and unified at L3.
Cache detection code is quite voluminous. Swipe NetBSD's
if FreeBSD/mips ever kicks off.
Many, many core variants.

sparc64
-------
Cache line discovery? Performed by Open Firmware.

Open Firmware property names used are ever so slightly different from Apple's.
icache-size icache-line-size icache-associativity
dcache-size dcache-line-size dcache-associativity
ecache-size ecache-line-size ecache-associativity

Already handled within cache.c, but assembly stubs *expect* this
information in a certain format.  Specifically they need to see
the data cache/instruction cache sizes and line sizes.

General rule: Split L1, Unified L2.
Cores: Spitfire/Blackbird/Cheetah


More information about the freebsd-hackers mailing list