[PATCH] hwpmc(4) changes to use 'mp_maxid' instead of 'mp_ncpus'.

Fri Mar 14 05:58:23 UTC 2008

On Fri, Mar 14, 2008 at 12:46 AM, John Baldwin <jhb at freebsd.org> wrote:
> On Thursday 13 March 2008 02:08:05 pm David O'Brien wrote:
>  > Hi folks,
>  > Some folks at Juniper have submitted these changes to hwpmc(4).
>  > I am sending them here for public review.
>  >
>  > Their thoughts are:
>  >     The mp_ncpus refers to the count of the active CPU's.  Where as
>  >     mp_maxid refers to the count of all the cpus on the SMP.  Using
>  >     mp_ncpus in the cpu_id range-check of hwpmc module would lead to the
>  >     assumption that all the active CPU's in the SMP are not interleaved.
>  >     But for running on some platforms, the active and inactive cpus could
>  >     be interleaved making hwpmc not work for the cpus whose cpu_id is
>  >     greater than the active-cpu count.

jhb>  This is correct, but you need to handle CPUs that are absent.  It might be
jhb>  sufficient to update pmc_cpu_is_disabled() in kern_pmc.c to check
jhb>  CPU_ABSENT(cpu) and claim the CPU is disabled if it is absent, but I'm not
jhb>  sure that will catch everything as that seems aimed at handling having a
jhb>  non-absent CPU halted (such as disabling HTT on i386).

That is inline with the feedback (and sample patch to kern_pmc.c) that I
had sent in to O'Brien.

But there are other problems with the patch at various levels,
probably not obvious to someone who is just looking at the kernel
code.

First, the relevance.  My understanding is that these changes are for
a proprietary SMP platform that uses non-mainstream (Tier3 or
Tier4) CPUs.  It so happens that Juniper decided to numbers CPUs
'sparsely' in their kernel variant and that is the motivation for this
patch.

IMO, as a policy, code changes for exotic hardware need to be
maintained by vendors of said exotic hardware and not dumped on
volunteers.

Second, when I designed the PMCTools API I didn't consider that CPU
numbers could be 'sparse'.  [They aren't sparsely allocated
on the i386/amd64---the code I looked at when I was designing
PmcTools.]  So there are assumptions sprinkled throughout userland
that that the integers 0..hw.ncpus  can select a valid CPU.  While
all that can be tracked down and changed, and documentation updated,
it is still work that I would prefer to defer until there is a chance
that someone
in the general public can use it.  I do need to prioritize how I spend my
volunteer hours.

Third, IFF we as a project are going to support 'sparse CPU numbering,
I would like to see the form that takes before making changes to
HWPMC and tools.  For example:
- How will userland and in-kernel modules find out which CPUs are
  physically present?   Would there be a bitmask on the lines of today's
  machdep.hlt_cpus that we could query?  Could we make the
  'all_cpus' bitmask visible to userland?  What happens when we
  start supporting systems with more than 32 processors?
- Will sysctl hw.ncpus represent the count of present CPUs or will it
  represent the maximum CPU id?
- How will userland  distinguish between absent CPUs those that
  could be temporarily administratively disabled?
- Are we going to support 'transient' CPUs [that come and go]?  Why
  would we want sparse CPU numbering otherwise?

Nit: 'mp_maxid' appears to be an index, not a count as claimed above.

If support for sparse CPU numbering is something useful, I feel the
correct sequence should be to discuss it here, add sparse CPU
numbering to the base i386/amd64 kernels (say) first and then
propagate the feature to auxiliary code like HWPMC and userland.

Changing HWPMC and its userland before the base kernel itself
changes does not seem to be the right thing to do.

Regards,
Koshy