svn commit: r218232 - head/sys/netinet

Fri Feb 4 18:56:15 UTC 2011

On 2/4/11 9:38 AM, Robert Watson wrote:
>
> On Thu, 3 Feb 2011, John Baldwin wrote:
>
>>>   1) Move per John Baldwin to mp_maxid
>>>   2) Some signed/unsigned errors found by Mac OS compiler (from 
>>> Michael)
>>>   3) a couple of copyright updates on the effected files.
>>
>> Note that mp_maxid is the maxium valid ID, so you typically have to 
>> do things like:
>>
>>     for (i = 0; i <= mp_maxid; i++) {
>>         if (CPU_ABSENT(i))
>>             continue;
>>         ...
>>     }
>>
>> There is a CPU_FOREACH() macro that does the above (but assumes you 
>> want to skip over non-existent CPUs).
>
> I'm finding the network stack requires quite a bit more along these 
> lines, btw.  I'd love also to have:
>
>   PACKAGE_FOREACH()
>   CORE_FOREACH()
>   HWTHREAD_FOREACH()
>

I agree, which is why I usually support adding such iterators though 
some people scream about them.
(e.g. FOREACH_THREAD_IN_PROC and there is one for iterating through 
vnets too.)

>   CURPACKAGE()
>   CURCORE()
>   CURTHREAD()

also current jail, vnet, etc. (these (kinda) exist)
>
> Available when putting together thread worker pools, distributing 
> work, identifying where to channel work, making dispatch decisions 
> and so on.  It seems likely that in some scenarios, it will be 
> desirable to have worker thread topology linked to hardware topology 
> -- for example, a network stack worker per core, with distribution 
> of work targeting the closest worker (subject to ordering 
> constraints)...
>
>> Hmmm, this is more complicated.  Can sctp_queue_to_mcore() handle 
>> the fact that a cpu_to_use value might not be valid?  If not you 
>> might want to maintain a separate "dense" virtual CPU ID table 
>> numbered 0 .. mp_ncpus - 1 that maps to "present" FreeBSD CPU IDs.  
>> I think Robert has done something similar to support RSS in TCP.  
>> Does that make sense?
>
> This proves somewhat complicated.  I basically have two models, 
> depending on whether RSS is involved (which adds an external 
> factor).  Without RSS, I build a contiguous workstream number space, 
> which is then mapped via a table to the CPU ID space, allowing 
> mappings and hashing to be done easily -- however, these refer to 
> ordered flow processing streams (i.e., "threads") rather than CPUs, 
> in the strict sense.  In the future with dynamic configuration, this 
> becomes important because what I do is rebalance ordered processing 
> streams rather than work to CPUs.  With RSS there has to be a link 
> between work distribution and the CPU identifiers shared by device 
> drivers, hardware, etc, in which case RSS identifies viable CPUs as 
> it starts (probably not quite correctly, I'll be looking for a 
> review of that code shortly, cleaning it up currently).
>
> This issue came up some at the BSDCan devsummit last year: as more 
> and more kernel subsystems need to exploit parallelism explicitly, 
> the thread programming model isn't bad, but lacks a strong tie to 
> hardware topology in order to help manage work distribution.  One 
> idea idly bandied around was to do something along the lines of 
> KSE/GCD for the kernel: provide a layered "work" model with ordering 
> constraints, rather than exploit threads directly, for work-oriented 
> subsystems.  This is effectively what netisr does, but in a network 
> stack-specific way.  But with crypto code, IPSEC, storage stuff, 
> etc, all looking to exploit parallelism, perhaps a more general 
> model is called for.
>
> Robert
>