HEADSUP: Turn off cpu_idle_hlt on SMP for now on x86
Julian Elischer
julian at elischer.org
Fri Feb 11 21:22:12 GMT 2005
John Baldwin wrote:
>I figured out a deadlock I got on a 4 CPU testbox last night and I think I
>have tracked it down to a bug in 4BSD that can be worked around by turning
>off idle halt via the machdep.cpu_idle_hlt sysctl. Specifically, it involves
>waking up halted CPUs when a thread it is pinned or bound to a specific CPU
>is made runnable. The details of what I saw today are below:
>
>FreeBSD/i386 (deimos.baldwin.cx) (ttyd0)
>
>login: KDB: enter: Line break on console
>[thread pid 13 tid 100006 ]
>Stopped at kdb_enter+0x30: leave
>db> show pcpu
>cpuid = 3
>curthread = 0xc1dfd5c0: pid 11 "idle: cpu3"
>curpcb = 0xdc608da0
>fpcurthread = none
>idlethread = 0xc1dfd5c0: pid 11 "idle: cpu3"
>APIC ID = 3
>currentldt = 0x28
>db> ps
> pid proc uid ppid pgrp flag stat wmesg wchan cmd
>79079 c2ebe1fc 1001 79067 610 0000012 [LOCK vm page queue mutex c1f90c40] make
>79078 c2f2ede4 1001 78894 610 0000002 [LOCK vm page queue mutex c1f90c40] sh
>79077 c3498000 1001 79043 610 0004002 [LOCK vm page queue mutex c1f90c40] make
>79075 c2f317f0 1001 78898 610 0004002 [LOCK vm page queue mutex c1f90c40] make
>79074 c2184de4 1001 78902 610 0004002 [LOCK vm page queue mutex c1f90c40] make
>79067 c23c71fc 1001 78955 610 0004002 [SLPQ ppwait 0xc23c71fc][SLP] make
>79060 c23e35f4 1001 78915 610 0004002 [LOCK Giant c1e41800] make
>79043 c2a50000 1001 79042 610 0004002 [SLPQ wait 0xc2a50000][SLP] sh
>79042 c2f493f8 1001 78884 610 0004002 [SLPQ select 0xc075e024][SLP] make
>78978 c28de7f0 1001 78975 610 0004002 [RUNQ] sh
>78975 c29bf1fc 1001 78970 610 0004002 [SLPQ select 0xc075e024][SLP] make
>...
> 40 c1e67000 0 0 0 0000204 [IWAIT] swi1: net
> 39 c1e671fc 0 0 0 0000204 [IWAIT] swi3: vm
> 38 c1e4b5f4 0 0 0 000020c [LOCK Giant c1e41800] swi4: clock sio
>--More--
> 37 c1e4b7f0 0 0 0 0000204 [IWAIT] irq13:
> 36 c1e4b9ec 0 0 0 0000204 [IWAIT] irq0: clk
> 35 c1e4bbe8 0 0 0 0000204 [IWAIT] irq23: ahc0 ahc1
> 34 c1e4bde4 0 0 0 0000204 [IWAIT] irq22:
> 33 c1e5b000 0 0 0 0000204 [IWAIT] irq21:
> 32 c1e5b1fc 0 0 0 0000204 [IWAIT] irq20:
> 31 c1e5b3f8 0 0 0 0000204 [LOCK Giant c1e41800] irq19: fxp0 uhci0
>...
> 14 c1dfc5f4 0 0 0 000020c [Can run] idle: cpu0
> 13 c1dfc7f0 0 0 0 000020c [CPU 1] idle: cpu1
> 12 c1dfc9ec 0 0 0 000020c [CPU 2] idle: cpu2
> 11 c1dfcbe8 0 0 0 000020c [CPU 3] idle: cpu3
>
>Thus, CPU 0 is running a make process while CPUs 1, 2, and 3 are all idle.
>Note that softclock() and irq19 (me trying to ssh in this morning over fxp0)
>are both blocked on Giant. Also note that there are 5 threads blocked on the
>vm page queue mutex. First, let's see who owns Giant and what they are doing.
>I was doing a benchmark, so I didn't have witness compiled in or turned on, so
>'show alllocks' wasn't an option.
>
>db> x/x Giant,10
>Giant: c0730028 c06ffb2c c06ffb2c b0000 0
>Giant+0x14: 0 0 c2ebd8a2 0
>Giant+0x24: 0 c07300a0 0 0
>time_monitor: 92c86 7 420c903b
>
>The owner is 0xc2ebd8a2 but minus 0x2 since that is a flag:
>
>db> show threads
>...
> 100286 (0xc2ebd8a0) sched_switch(c2ebd8a0,0,1,77,fcb6cb7e) at sched_switch+0x170
>...
> 100189 (0xc28dfcf0) sched_switch(c28dfcf0,0,2,dff58a7c,95a67b30) at sched_switch+0x170
>...
>db> tr 100286
>Tracing pid 79075 tid 100286 td 0xc2ebd8a0
>sched_switch(c2ebd8a0,0,1,77,fcb6cb7e) at sched_switch+0x170
>mi_switch(1,0,c07026d2,254,c075d568) at mi_switch+0x1d9
>turnstile_wait(c076bda0,c28dfcf0,c2ebd8a0,d17bc930,c21a79e8) at turnstile_wait+0x269
>_mtx_lock_sleep(c076bda0,c2ebd8a0,0,c0704083,dc5) at _mtx_lock_sleep+0x101
>_mtx_lock_flags(c076bda0,0,c0704083,dc5,c0704083) at _mtx_lock_flags+0x45
>vfs_clean_pages(d17bc930,0,c0704083,423,c21a79b4) at vfs_clean_pages+0x70
>bdwrite(d17bc930,c3bdfa00,80,2000,0) at bdwrite+0x2f0
>ffs_update(c46bf564,0,1,50,e00efb2c) at ffs_update+0x396
>ufs_inactive(e00efb44,e00efb5c,c05978ea,e00efb44,0) at ufs_inactive+0x225
>VOP_INACTIVE_AP(e00efb44,0,c0705028,768,c074e7a0) at VOP_INACTIVE_AP+0x4e
>vrele(c46bf564,c074e2a0,c46bf564,1,c255a700) at vrele+0x11a
>vn_close(c46bf564,1,c255a700,c2ebd8a0,0) at vn_close+0x50
>vn_closefile(c30e6318,c2ebd8a0,c06fddb9,846,c30e6318) at vn_closefile+0xf4
>fdrop_locked(c30e6318,c2ebd8a0,c06fddb9,831) at fdrop_locked+0x94
>fdrop(c30e6318,c2ebd8a0,c06fddb9,77c,c0515cfd,c0737ce0,1000,c2ebd8a0,e00efcac,c0
>56009c,1,e00efc80,c255a700,0,e00efc90,e00efc90,c0513783,3f,c31b93c8,0,6,c31b9300,e00efcb0,c051389d,c31b9300,0,6) at fdrop+0x3c
>closef(c30e6318,c2ebd8a0,c06fddb9,3e8,c2ebd8a0) at closef+0x3f2
>close(c2ebd8a0,e00efd14,4,e00efd00,1) at close+0x1f2
>syscall(2f,2f,2f,809c200,0) at syscall+0x2a0
>Xint0x80_syscall() at Xint0x80_syscall+0x1f
>--- syscall (6, FreeBSD ELF32, close), eip = 0x805c743, esp = 0xbfbfd18c, ebp = 0xbfbfd1a8 ---
>
>Lookup the pid in the ps output above, I see that this thread is one of the
>threads blocked on the vm page queue mutex. Ok, so let's look at that lock's
>owner:
>
>db> x/x 0xc076bda0,10
>vm_page_queue_mtx: c0730028 c07100af c07100af b0000
>vm_page_queue_mtx+0x10: 0 0 0 c28dfcf2
>vm_page_queue_mtx+0x20: 0 0 c1e673f8 f540
>vm_pages_needed: 0 0 0 1552
>
>Lock owner is 0xc28dfcf2. Looked that up via show threads and got:
>
>db> tr 100189
>Tracing pid 78978 tid 100189 td 0xc28dfcf0
>sched_switch(c28dfcf0,0,2,dff58a7c,95a67b30) at sched_switch+0x170
>mi_switch(2,0,c0701343,252,0) at mi_switch+0x1d9
>critical_exit(c074bcc0,c052ddd5,c0755f18,0,c28dfcf0) at critical_exit+0x7e
>intr_execute_handlers(c074bcc0,dff58b10,c076bc00,dff58b1c,c28dfcf0) at intr_execute_handlers+0xa4
>atpic_handle_intr(0) at atpic_handle_intr+0xaf
>Xatpic_intr0() at Xatpic_intr0+0x20
>--- interrupt, eip = 0xc054dd3c, esp = 0xdff58b54, ebp = 0xdff58b68 ---
>critical_exit(dff58b94,c06bbcbd,c075cae0,0,c0715535) at critical_exit+0xcc
>_mtx_unlock_spin_flags(c075cae0,0,c0715535,252,e) at _mtx_unlock_spin_flags+0x23
>pmap_invalidate_page(c29c869c,8064000,2,766,c19d6d10) at pmap_invalidate_page+0xdd
>pmap_enter(c29c869c,8064000,c19d6d10,3,0) at pmap_enter+0x33d
>vm_fault(c29c85dc,8064000,2,8,c28dfcf0) at vm_fault+0x13e7
>trap_pfault(dff58d48,1,80643e8,bfbfe060,80643e8) at trap_pfault+0xf1
>trap(2f,2f,2f,0,8069000) at trap+0x21a
>calltrap() at calltrap+0x5
>--- trap 0xc, eip = 0x80563cf, esp = 0xbfbfe0d0, ebp = 0xbfbfe0d8 ---
>
>So this is the thread that is in state RUNQ even though I have three idle
>CPUs. Very curious! I then wondered if my current CPU (CPU 3) thought
>that there were any runnable threads:
>
>db> call sched_runnable()
>0
>
>0 means no. That's when I figured that the runnable thread must be pinned,
>especially since it was interrupted while messing with the pmap which
>usually requires pinning. I then tried to do an experiment to test my theory
>by turning off the idle halt and manually IPI'ing all the other CPUs so that
>they would recheck their run queues and whichever CPU needed to run the
>process owning the vm page queue mutex would run. Unfortunately, I think I
>got the vector number wrong as all 3 CPUs paniced. :-P
>
>db> x cpu_idle_hlt
>cpu_idle_hlt: 1
>db> w cpu_idle hlt 0
>cpu_idle_hlt 0x1 = 0
>db> call lapic_ipi_raw(0x000c00e7, 0)
>0xd1622000
>db> c
>cccpppuuuiiiddd === 102;;; aaapppiiiccc iiiddd === 000102
>
>
>iiinnnssstttrrruuuccctttiiiooonnn pppoooiiinnnttteerer r =
>= = 0 x008xx:80:8x0:cx00cx60ca50536bae055eb9e
>a5st
>as
>ctkas ctpkao cipkno tipenorti en rt e r = 0 x= 1 0 :0=0x
>x10d0xc:16000xed:cf0bex40dfc
>afb6r40abm
>...
> 857 (make)
>trap number = 30
>panic: unknown/reserved trap
>cpuid = 0
>Uptime: 1d16h58m16s
>Cannot dump. No dump device defined.
>Automatic reboot in 15 seconds - press a key on the console to abort
>Rebooting...
>cpu_reset: Stopping other CPUs
>
>Thus, my theory is that when the pinned thread was preempted and put back on
>the run queue, the scheduler didn't IPI the CPU it was pinned to to wake it
>up in case it was idle. The IPI is only needed if the CPUs are halted, which
>is why I think turning the idle halt off might work as a workaround. I don't
>know if ULE has this same issue, but I've cc'd Jeff and hopefully he can look
>into it.
>
there is an option (sysctl even)
kern.sched.ipiwakeup.htt2: 0
kern.sched.ipiwakeup.onecpu: 0 <--------------
kern.sched.ipiwakeup.useloop: 0
kern.sched.ipiwakeup.usemask: 1
kern.sched.ipiwakeup.delivered: 0
kern.sched.ipiwakeup.requested: 0
kern.sched.ipiwakeup.enabled: 1
kern.sched.quantum: 100000
if this is set it should wake up all idle CPUS not just one of them.
>
>
>
More information about the freebsd-current
mailing list