7.1 hangs in cache_lookup mutex?

Guy Helmer ghelmer at palisadesys.com
Mon Mar 9 12:46:23 PDT 2009


Guy Helmer wrote:
> Guy Helmer wrote:
>> John Baldwin wrote:
>>> On Thursday 26 February 2009 5:27:07 pm Guy Helmer wrote:
>>>  
>>>> John Baldwin wrote:
>>>>   
>>>>> On Thursday 26 February 2009 4:22:15 pm Guy Helmer wrote:
>>>>>       
>>>>>> db> show sleepchain 23110
>>>>>> thread 100181 (pid 23110, vmstat) blocked on sx "user map" XLOCK
>>>>>> thread 100208 (pid 23092, kvoop) is on a run queue
>>>>>> db> show sleepchain 23092
>>>>>> thread 100208 (pid 23092, kvoop) is on a run queue
>>>>>>             
>>>>> Ah, so this is normal (well, mostly) in that kvoop is simply on 
>>>>> the run       
>>> queue 
>>>>> waiting for a CPU.  Can you find the thread pointer for kvoop and 
>>>>> check on things such as if it is pinned and if so to which CPU 
>>>>> (td_pinned will tell you the first, and td_sched->ts_cpu will tell 
>>>>> you the second with ULE).
>>>>>         
>>>> (kgdb) print td->td_pinned
>>>> $2 = 0
>>>>     
>>>
>>> Ok, not pinned.
>>>
>>>  
>>>>  From my captured ddb run:
>>>> cpuid        = 3
>>>> curthread    = 0xc5e2f000: pid 23090 "filter"
>>>> curpcb       = 0xe6f90d90
>>>> fpcurthread  = none
>>>> idlethread   = 0xc442daf0: pid 11 "idle: cpu3"
>>>> APIC ID      = 7
>>>> currentldt   = 0x50
>>>> spin locks held:
>>>>     
>>>
>>> At http://www.freebsd.org/~jhb/gdb/ you can find my kgdb scripts.  
>>> If you source gdb6 you can run 'runtds' which will show you what 
>>> each CPU is doing (more or less) in ps-style output.
>>>
>>>  
>>>> I sure wish I could find the root cause of the hangs.  On a hunch, 
>>>> I tried setting "machdep.cpu_idle_hlt=0" on the amd64 machine, and 
>>>> it has run 32 hours without a hang.  It could just be coincidence, 
>>>> though...
>>>>     
>>>
>>> Ahhh, that actually could explain it perhaps.  Do your CPUs support 
>>> C2 or higher sleep states for idle?  You can try limiting it to only 
>>> C1 (or disable C1E in your BIOS if it has an option for that) to see 
>>> if that fixes it.
>>>
>>>   
>> I don't think the CPUs support anything lower than C1 - there is no 
>> hw.acpi.cpu.cx_supported sysctl node, and hw.cpi.cpu.cx_lowest is 
>> C1.  C1-Enhanced was already disabled in the BIOS, at least on the 
>> machine running amd64.  48 hours of runtime, and no hangs seen yet.  
>> I did reboot it this morning to check the sleep settings in the BIOS.
> Despite having machdep.cpu_idle_hlt=0, the machine wedged for 40 hours 
> over the weekend but came back to life by itself.  Could this be lost 
> IPIs, or a bug in the scheduler?
To finish off this thread, after I disabled hyperthreading in the BIOS 
on this machine (dual Nocona Xeons in a Supermicro X6DHR-8G) it was 
stable for 96 hours.  I applied rev 189023 
(machdep.hyperthreading_allowed=0 disables HT cores at boot) to 
7.1-release, set machdep.hyperthreading_allowed=0 in /boot/loader.conf, 
re-enabled hyperthreading the BIOS to verify the effect of r189023, and 
the machine has been stable for 92 hours.

Guy



More information about the freebsd-stable mailing list