kern/145385: [cpu] Logical processor cannot be disabled for some
SMT-enabled Intel procs
Garrett Cooper
gcooper at FreeBSD.org
Thu Aug 26 04:10:10 UTC 2010
The following reply was made to PR kern/145385; it has been noted by GNATS.
From: Garrett Cooper <gcooper at FreeBSD.org>
To: Jeff Roberson <jroberson at jroberson.net>
Cc: bug-followup at freebsd.org, jkim at freebsd.org,
Attilio Rao <attilio at freebsd.org>, jeff at freebsd.org
Subject: Re: kern/145385: [cpu] Logical processor cannot be disabled for some
SMT-enabled Intel procs
Date: Wed, 25 Aug 2010 21:08:32 -0700
On Tue, Aug 24, 2010 at 9:53 PM, Jeff Roberson <jroberson at jroberson.net> wr=
ote:
> On Tue, 24 Aug 2010, Garrett Cooper wrote:
>
>> On Tue, Aug 24, 2010 at 3:45 PM, Garrett Cooper <gcooper at freebsd.org>
>> wrote:
>>>
>>> On Tue, Aug 24, 2010 at 2:51 PM, Garrett Cooper <yanegomi at gmail.com>
>>> wrote:
>>>>
>>>> On Aug 24, 2010, at 2:03 PM, Jeff Roberson wrote:
>>>>
>>>>
>>>> On Tue, 24 Aug 2010, Garrett Cooper wrote:
>>>>
>>>> On Tue, Aug 24, 2010 at 12:22 PM, Jeff Roberson
>>>> <jroberson at jroberson.net>
>>>> wrote:
>>>>
>>>> On Tue, 24 Aug 2010, Garrett Cooper wrote:
>>>>
>>>> On Mon, Aug 23, 2010 at 6:33 AM, John Baldwin <jhb at freebsd.org> wrote:
>>>>
>>>> On Sunday, August 22, 2010 4:17:37 am Garrett Cooper wrote:
>>>>
>>>> =A0 =A0 =A0 The following trivial patch fixes the issue on my W3520 pr=
ocessor;
>>>>
>>>> AFAICS
>>>>
>>>> it's what should be done after reading several of the specs because th=
e
>>>>
>>>> logical count that's tracked with ebx is exactly what is needed for
>>>>
>>>> logical_cpus (it's an absolute quantity). I need to verify it with a
>>>>
>>>> multi-cpu
>>>>
>>>> topology at work (the two r710s I was testing with E-series Xeons on
>>>>
>>>> aren't
>>>>
>>>> available remotely right now).
>>>>
>>>> Thanks!
>>>>
>>>> -Garrett
>>>>
>>>> Jung-uk Kim and Attilio Rao have both been looking at this code recent=
ly
>>>>
>>>> and
>>>>
>>>> are in a better position to review the patch in the PR.
>>>>
>>>> (Moving jhb@ to BCC, adding jeff@ for possible input on ULE)
>>>>
>>>> The patch works as expected (it now properly detects the SMIT CPUs as
>>>>
>>>> logical CPUs), but setting machdep.hlt_logical_cpus=3D1 causes other
>>>>
>>>> problems with scheduling tasks because certain kernel threads get
>>>>
>>>> stuck at boot when netbooting (in particular I've seen problems with
>>>>
>>>> usbhub* and a few others bits), so in order for
>>>>
>>>> machdep.hlt_logical_cpus to be fixed on SMT processors, it might
>>>>
>>>> require some changes to the ULE scheduler to shuffle around the
>>>>
>>>> threads to available cores/processors?
>>>>
>>>>
>>>> hlt_logical_cpus should be rewritten to use cpusets to change the
>>>> default
>>>>
>>>> system set rather than specifically halting those cpus. =A0There are a
>>>> number
>>>>
>>>> of loops in the kernel that iterate over all cpus and attempt to bind
>>>> and
>>>>
>>>> perform some task. =A0I think there are a number of other reasons to
>>>> prefer a
>>>>
>>>> less aggressive approach to avoiding the logical cpus as well. Simply
>>>>
>>>> preventing user thread schedule will achieve the intent of the sysctl =
in
>>>> any
>>>>
>>>> event.
>>>>
>>>> =A0=A0Ok... in that event then the bug is ok, but maybe I should add
>>>>
>>>> some code to the patch to warn the user about functional issues
>>>>
>>>> associated with halting logical CPUs?
>>>>
>>>> I don't think the bug is ok. =A0We probably shouldn't have sysctls whi=
ch
>>>> readily break the kernel. =A0As I said we should instead have the sysc=
tl
>>>> backend to cpuset. =A0It shouldn't take more than an hour to code and
>>>> test.
>>>
>>> =A0 =A0Ok.. I'll look at this once I have my other system back online s=
o
>>> I can actively break something until I get it to work.
>>
>> =A0 BTW... there's a lot of code in machdep.c that does the same thing
>> to idle the CPU, for instance, cpu_idle_hlt, cpu_idle_acpi,
>> cpu_idle_amdc1e (on amd64). What should be done about those cases
>> (same thing, or different)?
>
> Those are the actual idle functions that the scheduler uses. =A0Those are
> safe.
I'll look into running this on a Nehalem processor machine, but
this appears to as expected on my Penryn processor test machine with
machdep.hlt_cpus =3D { 110, 101, 11, 0 } and with machdep.idle=3Dacpi; I'm
not sure if the if the loop is supposed to be there still, but it
wouldn't make sense because the CPU would be spinning in the kernel.
Thanks,
-Garrett
More information about the freebsd-bugs
mailing list