7.1 hangs in cache_lookup mutex?

Guy Helmer ghelmer at palisadesys.com
Mon Mar 2 06:11:17 PST 2009


Guy Helmer wrote:
> John Baldwin wrote:
>> On Thursday 26 February 2009 5:27:07 pm Guy Helmer wrote:
>>  
>>> John Baldwin wrote:
>>>    
>>>> On Thursday 26 February 2009 4:22:15 pm Guy Helmer wrote:
>>>>        
>>>>> db> show sleepchain 23110
>>>>> thread 100181 (pid 23110, vmstat) blocked on sx "user map" XLOCK
>>>>> thread 100208 (pid 23092, kvoop) is on a run queue
>>>>> db> show sleepchain 23092
>>>>> thread 100208 (pid 23092, kvoop) is on a run queue
>>>>>             
>>>> Ah, so this is normal (well, mostly) in that kvoop is simply on the 
>>>> run       
>> queue  
>>>> waiting for a CPU.  Can you find the thread pointer for kvoop and 
>>>> check on things such as if it is pinned and if so to which CPU 
>>>> (td_pinned will tell you the first, and td_sched->ts_cpu will tell 
>>>> you the second with ULE).
>>>>         
>>> (kgdb) print td->td_pinned
>>> $2 = 0
>>>     
>>
>> Ok, not pinned.
>>
>>  
>>>  From my captured ddb run:
>>> cpuid        = 3
>>> curthread    = 0xc5e2f000: pid 23090 "filter"
>>> curpcb       = 0xe6f90d90
>>> fpcurthread  = none
>>> idlethread   = 0xc442daf0: pid 11 "idle: cpu3"
>>> APIC ID      = 7
>>> currentldt   = 0x50
>>> spin locks held:
>>>     
>>
>> At http://www.freebsd.org/~jhb/gdb/ you can find my kgdb scripts.  If 
>> you source gdb6 you can run 'runtds' which will show you what each 
>> CPU is doing (more or less) in ps-style output.
>>
>>  
>>> I sure wish I could find the root cause of the hangs.  On a hunch, I 
>>> tried setting "machdep.cpu_idle_hlt=0" on the amd64 machine, and it 
>>> has run 32 hours without a hang.  It could just be coincidence, 
>>> though...
>>>     
>>
>> Ahhh, that actually could explain it perhaps.  Do your CPUs support 
>> C2 or higher sleep states for idle?  You can try limiting it to only 
>> C1 (or disable C1E in your BIOS if it has an option for that) to see 
>> if that fixes it.
>>
>>   
> I don't think the CPUs support anything lower than C1 - there is no 
> hw.acpi.cpu.cx_supported sysctl node, and hw.cpi.cpu.cx_lowest is C1.  
> C1-Enhanced was already disabled in the BIOS, at least on the machine 
> running amd64.  48 hours of runtime, and no hangs seen yet.  I did 
> reboot it this morning to check the sleep settings in the BIOS.
Despite having machdep.cpu_idle_hlt=0, the machine wedged for 40 hours 
over the weekend but came back to life by itself.  Could this be lost 
IPIs, or a bug in the scheduler?

Guy


More information about the freebsd-stable mailing list