getaffinity/setaffinity and cpu sets.

Thu Feb 21 09:27:43 UTC 2008

On Wed, 20 Feb 2008, Jeff Roberson wrote:

> I also have a 'cpuset' command which can run a new program with a given cpu 
> set, view and modify sets of arbitrary pids.  This is all working and I can 
> supply patches if anyone is interested.  I have to implement 4BSD support 
> before I can commit.
>
> I have a proposal for solaris style processor sets which I think is simple 
> and sufficient for most cases.  It involves the following new syscalls:
>
> int cpuset(void); int setcpuset(pid_t pid, int setid); int getcpuset(pid_t 
> pid);
>
> The notion would be that you can create a new numbered cpuset with cpuset(). 
> You can modify or inspect its affinity with get/setaffinity above and the 
> CPU_WHICH_SET argument.  The cpuset exists as long as there are members of 
> the set.  Sort of like a process group or session.  The {get,set}cpuset 
> calls can inspect or modify the state.
>
> This set would not be modifiable by user processes or by processes in a 
> jail. It would create the restriction that differs between 'avail' and 'sys' 
> above. Processors would be able to directly bind to any processor within the 
> set. Changing the set would apply to all processes in the set. The cpuset 
> would be per-process while the mask is per-thread.  Sets involvement is 
> inherited on fork().
>
> In solaris sets can be named and have a more complete management api.  I'm 
> not really interested in implementing all of that but I believe what I have 
> outlined here would be subset of this and no code/syscalls would be wasted.
>
> Comments?  Objections?  I'm fairly pleased with this arrangement now.

Just to put a few notes from our conversation on IRC in e-mail:

- I think I'd prefer int cpuset(cpuset_t *set), int getcpuset(pid_t, cpuset_t
   *) so that we don't mix up ID's and return values.  More recent interfaces
   tend to do this, I believe, and it means that the prototype, even if not the
   ABI, remains the same if the set identifier changes in the future.

- You don't mention what happens if a process's cpu set changes to preclude a
   CPU the process has a thread with affinity for.  Online, you suggested
   SIGKILL, and I thought maybe a new SIGCPUGONE with a default SIGKILL action
   might be a friendlier model.  We should see what Solaris and others do here
   though.  I like the idea that the affinity is a guarantee in userspace
   because it means that you can rely on it; I'm OK with the idea that your
   thread always runs on the CPUs you have affinity for unless in the
   SIGCPUGONE handler :-).

- It would be nice to be able to use CPU sets in jail as well, suggesting a
   hierarchal model with some sort of tagging so you know what CPU sets were
   created in a jail such that you know whether they can be changed in a jail.
   While I recognize this makes things a lot more tricky, I think we should
   basically be planning more carefully with respect to virtualization when we
   add new interfaces, since it's a widely used feature, and the current set of
   "stragglers" unsupported in Jail is growing rather than shrinking.

- There's still no way to specify an affinity policy rather than explicit
   affinity, but if our CPU set model is sufficiently general, that might be a
   vehicle to do that.  I.e., cpuset_setpolicy() rather than setting a mask.

- In the interests of boring API changes, recent APIs tend to prefix the
   method on the object name.  Have you thought about cpuset_create(),
   cpuset_foo(), etc?  That reduces the chances of interfering with application
   namespaces.  I think, anyway. :-).

I need to ponder the proposal a little more, ideally over a hot beverage this 
morning, and will follow up if I have further thoughts.  Thanks for working on 
this, BTW -- affinity is well-overdue for FreeBSD.

Robert N M Watson
Computer Laboratory
University of Cambridge