7.1 hangs in cache_lookup mutex?

Guy Helmer ghelmer at palisadesys.com
Fri Feb 27 07:44:23 PST 2009


John Baldwin wrote:
> On Thursday 26 February 2009 5:27:07 pm Guy Helmer wrote:
>   
>> John Baldwin wrote:
>>     
>>> On Thursday 26 February 2009 4:22:15 pm Guy Helmer wrote:
>>>   
>>>       
>>>> db> show sleepchain 23110
>>>> thread 100181 (pid 23110, vmstat) blocked on sx "user map" XLOCK
>>>> thread 100208 (pid 23092, kvoop) is on a run queue
>>>> db> show sleepchain 23092
>>>> thread 100208 (pid 23092, kvoop) is on a run queue
>>>>     
>>>>         
>>> Ah, so this is normal (well, mostly) in that kvoop is simply on the run 
>>>       
> queue 
>   
>>> waiting for a CPU.  Can you find the thread pointer for kvoop and check on 
>>> things such as if it is pinned and if so to which CPU (td_pinned will tell 
>>> you the first, and td_sched->ts_cpu will tell you the second with ULE).
>>>   
>>>       
>> (kgdb) print td->td_pinned
>> $2 = 0
>>     
>
> Ok, not pinned.
>
>   
>>  From my captured ddb run:
>> cpuid        = 3
>> curthread    = 0xc5e2f000: pid 23090 "filter"
>> curpcb       = 0xe6f90d90
>> fpcurthread  = none
>> idlethread   = 0xc442daf0: pid 11 "idle: cpu3"
>> APIC ID      = 7
>> currentldt   = 0x50
>> spin locks held:
>>     
>
> At http://www.freebsd.org/~jhb/gdb/ you can find my kgdb scripts.  If you 
> source gdb6 you can run 'runtds' which will show you what each CPU is doing 
> (more or less) in ps-style output.
>
>   
>> I sure wish I could find the root cause of the hangs.  On a hunch, I 
>> tried setting "machdep.cpu_idle_hlt=0" on the amd64 machine, and it has 
>> run 32 hours without a hang.  It could just be coincidence, though...
>>     
>
> Ahhh, that actually could explain it perhaps.  Do your CPUs support C2 or 
> higher sleep states for idle?  You can try limiting it to only C1 (or disable 
> C1E in your BIOS if it has an option for that) to see if that fixes it.
>
>   
I don't think the CPUs support anything lower than C1 - there is no 
hw.acpi.cpu.cx_supported sysctl node, and hw.cpi.cpu.cx_lowest is C1.  
C1-Enhanced was already disabled in the BIOS, at least on the machine 
running amd64.  48 hours of runtime, and no hangs seen yet.  I did 
reboot it this morning to check the sleep settings in the BIOS.

Guy



More information about the freebsd-stable mailing list