svn commit: r218232 - head/sys/netinet

Fri Feb 4 17:38:05 UTC 2011

On Thu, 3 Feb 2011, John Baldwin wrote:

>>   1) Move per John Baldwin to mp_maxid
>>   2) Some signed/unsigned errors found by Mac OS compiler (from Michael)
>>   3) a couple of copyright updates on the effected files.
>
> Note that mp_maxid is the maxium valid ID, so you typically have to do 
> things like:
>
> 	for (i = 0; i <= mp_maxid; i++) {
> 		if (CPU_ABSENT(i))
> 			continue;
> 		...
> 	}
>
> There is a CPU_FOREACH() macro that does the above (but assumes you want to 
> skip over non-existent CPUs).

I'm finding the network stack requires quite a bit more along these lines, 
btw.  I'd love also to have:

   PACKAGE_FOREACH()
   CORE_FOREACH()
   HWTHREAD_FOREACH()

   CURPACKAGE()
   CURCORE()
   CURTHREAD()

Available when putting together thread worker pools, distributing work, 
identifying where to channel work, making dispatch decisions and so on.  It 
seems likely that in some scenarios, it will be desirable to have worker 
thread topology linked to hardware topology -- for example, a network stack 
worker per core, with distribution of work targeting the closest worker 
(subject to ordering constraints)...

> Hmmm, this is more complicated.  Can sctp_queue_to_mcore() handle the fact 
> that a cpu_to_use value might not be valid?  If not you might want to 
> maintain a separate "dense" virtual CPU ID table numbered 0 .. mp_ncpus - 1 
> that maps to "present" FreeBSD CPU IDs.  I think Robert has done something 
> similar to support RSS in TCP.  Does that make sense?

This proves somewhat complicated.  I basically have two models, depending on 
whether RSS is involved (which adds an external factor).  Without RSS, I build 
a contiguous workstream number space, which is then mapped via a table to the 
CPU ID space, allowing mappings and hashing to be done easily -- however, 
these refer to ordered flow processing streams (i.e., "threads") rather than 
CPUs, in the strict sense.  In the future with dynamic configuration, this 
becomes important because what I do is rebalance ordered processing streams 
rather than work to CPUs.  With RSS there has to be a link between work 
distribution and the CPU identifiers shared by device drivers, hardware, etc, 
in which case RSS identifies viable CPUs as it starts (probably not quite 
correctly, I'll be looking for a review of that code shortly, cleaning it up 
currently).

This issue came up some at the BSDCan devsummit last year: as more and more 
kernel subsystems need to exploit parallelism explicitly, the thread 
programming model isn't bad, but lacks a strong tie to hardware topology in 
order to help manage work distribution.  One idea idly bandied around was to 
do something along the lines of KSE/GCD for the kernel: provide a layered 
"work" model with ordering constraints, rather than exploit threads directly, 
for work-oriented subsystems.  This is effectively what netisr does, but in a 
network stack-specific way.  But with crypto code, IPSEC, storage stuff, etc, 
all looking to exploit parallelism, perhaps a more general model is called 
for.

Robert