kern/145385: [cpu] Logical processor cannot be disabled for some SMT-enabled Intel procs

Jeff Roberson jroberson at jroberson.net
Wed Aug 25 05:00:24 UTC 2010


The following reply was made to PR kern/145385; it has been noted by GNATS.

From: Jeff Roberson <jroberson at jroberson.net>
To: Garrett Cooper <gcooper at FreeBSD.org>
Cc: bug-followup at freebsd.org, jkim at freebsd.org, 
    Attilio Rao <attilio at freebsd.org>, jeff at freebsd.org
Subject: Re: kern/145385: [cpu] Logical processor cannot be disabled for some
 SMT-enabled Intel procs
Date: Tue, 24 Aug 2010 18:53:25 -1000 (HST)

   This message is in MIME format.  The first part should be readable text,
   while the remaining parts are likely unreadable without MIME-aware tools.
 
 --2547152148-1953797491-1282712009=:23448
 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 8BIT
 
 On Tue, 24 Aug 2010, Garrett Cooper wrote:
 
 > On Tue, Aug 24, 2010 at 3:45 PM, Garrett Cooper <gcooper at freebsd.org> wrote:
 >> On Tue, Aug 24, 2010 at 2:51 PM, Garrett Cooper <yanegomi at gmail.com> wrote:
 >>> On Aug 24, 2010, at 2:03 PM, Jeff Roberson wrote:
 >>>
 >>>
 >>> On Tue, 24 Aug 2010, Garrett Cooper wrote:
 >>>
 >>> On Tue, Aug 24, 2010 at 12:22 PM, Jeff Roberson <jroberson at jroberson.net>
 >>> wrote:
 >>>
 >>> On Tue, 24 Aug 2010, Garrett Cooper wrote:
 >>>
 >>> On Mon, Aug 23, 2010 at 6:33 AM, John Baldwin <jhb at freebsd.org> wrote:
 >>>
 >>> On Sunday, August 22, 2010 4:17:37 am Garrett Cooper wrote:
 >>>
 >>>       The following trivial patch fixes the issue on my W3520 processor;
 >>>
 >>> AFAICS
 >>>
 >>> it's what should be done after reading several of the specs because the
 >>>
 >>> logical count that's tracked with ebx is exactly what is needed for
 >>>
 >>> logical_cpus (it's an absolute quantity). I need to verify it with a
 >>>
 >>> multi-cpu
 >>>
 >>> topology at work (the two r710s I was testing with E-series Xeons on
 >>>
 >>> aren't
 >>>
 >>> available remotely right now).
 >>>
 >>> Thanks!
 >>>
 >>> -Garrett
 >>>
 >>> Jung-uk Kim and Attilio Rao have both been looking at this code recently
 >>>
 >>> and
 >>>
 >>> are in a better position to review the patch in the PR.
 >>>
 >>> (Moving jhb@ to BCC, adding jeff@ for possible input on ULE)
 >>>
 >>> The patch works as expected (it now properly detects the SMIT CPUs as
 >>>
 >>> logical CPUs), but setting machdep.hlt_logical_cpus=1 causes other
 >>>
 >>> problems with scheduling tasks because certain kernel threads get
 >>>
 >>> stuck at boot when netbooting (in particular I've seen problems with
 >>>
 >>> usbhub* and a few others bits), so in order for
 >>>
 >>> machdep.hlt_logical_cpus to be fixed on SMT processors, it might
 >>>
 >>> require some changes to the ULE scheduler to shuffle around the
 >>>
 >>> threads to available cores/processors?
 >>>
 >>>
 >>> hlt_logical_cpus should be rewritten to use cpusets to change the default
 >>>
 >>> system set rather than specifically halting those cpus.  There are a number
 >>>
 >>> of loops in the kernel that iterate over all cpus and attempt to bind and
 >>>
 >>> perform some task.  I think there are a number of other reasons to prefer a
 >>>
 >>> less aggressive approach to avoiding the logical cpus as well. Simply
 >>>
 >>> preventing user thread schedule will achieve the intent of the sysctl in any
 >>>
 >>> event.
 >>>
 >>>   Ok... in that event then the bug is ok, but maybe I should add
 >>>
 >>> some code to the patch to warn the user about functional issues
 >>>
 >>> associated with halting logical CPUs?
 >>>
 >>> I don't think the bug is ok.  We probably shouldn't have sysctls which
 >>> readily break the kernel.  As I said we should instead have the sysctl
 >>> backend to cpuset.  It shouldn't take more than an hour to code and test.
 >>
 >>    Ok.. I'll look at this once I have my other system back online so
 >> I can actively break something until I get it to work.
 >
 >    BTW... there's a lot of code in machdep.c that does the same thing
 > to idle the CPU, for instance, cpu_idle_hlt, cpu_idle_acpi,
 > cpu_idle_amdc1e (on amd64). What should be done about those cases
 > (same thing, or different)?
 
 Those are the actual idle functions that the scheduler uses.  Those are 
 safe.
 
 Thanks,
 Jeff
 
 > Thanks,
 > -Garrett
 >
 --2547152148-1953797491-1282712009=:23448--


More information about the freebsd-bugs mailing list