Dynamic Per-cpu allocator

Alexander V. Chernikov melifaro at FreeBSD.org
Fri Jul 20 14:59:55 UTC 2012


Hello list!

It seems it is time to discuss dynamic part of pcpu allocator.

We already have great static one, permitting to statically define 
per-cpu counters/structures in the source code including modules.
(Just in case, it uses DPCPU_  macro and resides in sys/sys/pcpu.h)

However, this is not enough since there are many non-sigleton objects 
requiring dynamic per-cpu counters allocations.

Networking stack is definitely candidate for using such api (interface 
counters, netgraph nodes counters, global / per-protocol statistics (it 
seems existing DPCPU macro can be used for the latter)).

My routing performance tests shows, that eliminating contested counters 
can give quite significant speed improvement. For example, after 
removing interface counters and IP statistics (~ 11 counters total) 
forwarding speed increases from 2MPPS (2 millions packets/sec) to 3MPPS.

There are more details about this particular test in
http://lists.freebsd.org/pipermail/freebsd-net/2012-July/032714.html

On the other side, PoC ipfw per-cpu counters implementation shows no 
observable overhead between enabled/disabled rule counters:

http://lists.freebsd.org/pipermail/freebsd-net/2012-July/032824.html

Preemption is not disabled here (typically either netisr thread or isr 
routine is already cpu-bound)

Disabling preemption via critical_enter() gives us 80kpps drop (with one 
counter):
http://lists.freebsd.org/pipermail/freebsd-net/2012-July/032835.html

It seems that there is no reason in precise accounting for total number 
of bytes forwarded (or fastforwarded). On the other side, one may want 
to account interface bytes/packes.

So, what do we need for networking stack (from my point of view):

1) Ability to allocate single pcpu counter (various ng* nodes)
2) Ability to allocate arbitrary structures (per-VNET protocol statistics)
3) Ability to allocate either contiguous linear pool for objects or 
uma-like allocation (per-interface counters)
4) Ability to use either fast (non-protected) or precise updating


What others do:
I've found nothing related in OpenSolaris and DragonFly (maybe someone?)
Good observation of Linux API: http://www.makelinux.net/ldd3/chp-8-sect-5


Proposed API (not even a draft, just to start discussion with something):

We already have DPCPU_ macro for "static" data, but I'm not sure if we 
can keep the same names for dynamic data.

We can add
1)
* DPCPU_ALLOC_CNTR()
* DPCU_FREE_CNTR()
(not sure if existing DPCPU_ macro can be used)

2 + 3)
/*
  * Allocate structure (or several items) of total size "size" with
  * given alignment "align" and malloc flags "flags.
  * Returns:
  * array of pointers (with mp_maxid or MAXCPU size) to per-cpu data:

   +-------------------------------------------
   |pcpu0    pcpu1    pcpu2    pcpu3   .pcpuN..
   +-------------------------------------------
      +        +        +        +
   +------+ +------+ +------+ +------+
   |      | |      | |      | |      |
   |      | |      | |      | |      |
   |      | |      | |      | |      |
   |      | |      | |      | |      |
   |      | |      | |      | |      |
   |      | |      | |      | |      |
   |      | |      | |      | |      |
   |      | |      | |      | |      |
   |      | |      | |      | |      |
   |      | |      | |      | |      |
   |      | |      | |      | |      |
   |      | |      | |      | |      |
   |      | |      | |      | |      |
   +------+ +------+ +------+ +------+
  */

(void *) DPCPU_ALLOC(size_t size, int align, int flags)
DPCPU_FREE(void *)

/*
  * Returns typed pointer to per-cpu data block.
  * Disables preemption
*/
type *DPCPU_GET(void *, type)

/*
  * Enables preemption again
  */
DPCPU_PUT(void *)


/*
  * Returns typed pointer to per-cpu data block without
  * disabling preemption
  */
DPCPU_GET_FAST(void *, type)

DPCPU_PUT_FAST(void *) /* No-op */

/*
  * Get remote cpu value
  */
DPCPU_GET_REMOTE(void *, type, index)

/* Use CPU_FOREACH for summary */


-- 
WBR, Alexander



More information about the freebsd-arch mailing list