"sleeping without queue" ?

John Baldwin jhb at freebsd.org
Wed Jul 23 18:46:36 UTC 2008


On Wednesday 23 July 2008 08:03:48 am Kostik Belousov wrote:
> On Tue, Jul 22, 2008 at 03:59:57PM -0400, Mikhail Teterin wrote:
> > Kostik Belousov написав(ла):
> > >On Tue, Jul 22, 2008 at 03:26:29PM -0400, Mikhail Teterin wrote:
> > >>Kostik Belousov написав(ла):
> > >>>Did you switched to the process before doing backtrace (using the proc 
> > >>><pid>
> > >>>command)?
> > >>Ok, thanks. Did not know about this one. Here:
> > >>...
> > >>(kgdb) proc 79759
> > >>(kgdb) bt
> > >>#0  sched_switch (td=0xffffff01286dc000, newtd=0xffffff00010ce000, 
> > >>flags=2) at /var/src/sys/kern/sched_4bsd.c:928
> > >>#1  0x0000000000000000 in ?? ()
> > >>#2  0xffffffff802f1108 in mi_switch (flags=678281216, newtd=0x2) at 
> > >>/var/src/sys/kern/kern_synch.c:442
> > >>#3  0xffffffff80318513 in sleepq_check_timeout () at 
> > >>/var/src/sys/kern/subr_sleepqueue.c:519
> > >>#4  0xffffffff80318c85 in sleepq_timedwait (wchan=0xffffffff80688408) at 
> > >>/var/src/sys/kern/subr_sleepqueue.c:597
> > >>#5  0xffffffff802f16a2 in _sleep (ident=0xffffffff80688408, lock=0x0, 
> > >>priority=0, wmesg=0xffffffff804f3059 "vmo_de", timo=1) at 
> > >>/var/src/sys/kern/kern_synch.c:224
> > >>#6  0xffffffff8043036b in vm_object_deallocate 
> > >>(object=0xffffff0053024a90) at /var/src/sys/vm/vm_object.c:509
> > >From this frame, please, print the object (like p *object) and
> > >likewise, print the object that is at the head of the object->shadow_head
> > >list.
> > kgdb /usr/obj/var/src/sys/SILVER-SMP/kernel.debug /dev/mem
> > [GDB will not be able to debug user-mode threads: 
> > /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> > GNU gdb 6.1.1 [FreeBSD]
> > Copyright 2004 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and you 
are
> > welcome to change it and/or distribute copies of it under certain 
> > conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB.  Type "show warranty" for 
details.
> > This GDB was configured as "amd64-marcel-freebsd".
> > There is no member named pathname.
> > Reading symbols from /opt/modules/fuse.ko...done.
> > Loaded symbols for /opt/modules/fuse.ko
> > Reading symbols from /opt/modules/rtc.ko...done.
> > Loaded symbols for /opt/modules/rtc.ko
> > Reading symbols from /boot/kernel/snd_ich.ko...Reading symbols from 
> > /boot/kernel/snd_ich.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/snd_ich.ko
> > Reading symbols from /boot/kernel/msdosfs.ko...Reading symbols from 
> > /boot/kernel/msdosfs.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/msdosfs.ko
> > #0  0x0000000000000000 in ?? ()
> > (kgdb) frame 6
> > Error accessing memory address 0x0: Bad address.
> > (kgdb) pid 79759
> > Undefined command: "pid".  Try "help".
> > (kgdb) proc 79759
> > (kgdb) frame 6
> > #6  0xffffffff8043036b in vm_object_deallocate 
> > (object=0xffffff0053024a90) at /var/src/sys/vm/vm_object.c:509
> > 509                                             pause("vmo_de", 1);
> > (kgdb) p *object
> > $1 = {mtx = {lock_object = {lo_name = 0xffffffff804f21c4 "vm object", 
> > lo_type = 0xffffffff804f3018 "standard object", lo_flags = 21168128, 
> > lo_witness_data = {
> >        lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, 
> > mtx_recurse = 0}, object_list = {tqe_next = 0xffffff0005018a90,
> >    tqe_prev = 0xffffff00539a6850}, shadow_head = {lh_first = 
> > 0xffffff005d3afa90}, shadow_list = {le_next = 0x0, le_prev = 
> > 0xffffff005d2cd048}, memq = {
> >    tqh_first = 0xffffff007eb9fa58, tqh_last = 0xffffff007f864820}, root 
> > = 0xffffff007ee14d38, size = 427, generation = 66, ref_count = 2, 
> > shadow_count = 1,
> >  type = 0 '\0', flags = 256, pg_color = 0, paging_in_progress = 0, 
> > resident_page_count = 44, backing_object = 0x0, backing_object_offset = 
> > 0, pager_object_list = {
> >    tqe_next = 0x0, tqe_prev = 0x0}, cache = 0x0, handle = 0x0, un_pager 
> > = {vnp = {vnp_size = 576646}, devp = {devp_pglist = {tqh_first = 0x8cc86,
> >        tqh_last = 0x0}}, swp = {swp_bcount = 576646}}}
> > (kgdb) p (object->shadow_head)
> > $2 = {lh_first = 0xffffff005d3afa90}
> > (kgdb) p *object->shadow_head.lh_first
> > $3 = {mtx = {lock_object = {lo_name = 0xffffffff804f21c4 "vm object", 
> > lo_type = 0xffffffff804f3018 "standard object", lo_flags = 21168128, 
> > lo_witness_data = {
> >        lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, 
> > mtx_recurse = 0}, object_list = {tqe_next = 0xffffff0066c32340,
> >    tqe_prev = 0xffffff012f673ac0}, shadow_head = {lh_first = 0x0}, 
> > shadow_list = {le_next = 0x0, le_prev = 0xffffff0053024ad0}, memq = {
> >    tqh_first = 0xffffff007779f9a0, tqh_last = 0xffffff0077c04140}, root 
> > = 0xffffff0077c04130, size = 387, generation = 3, ref_count = 1, 
> > shadow_count = 0,
> >  type = 0 '\0', flags = 8452, pg_color = 0, paging_in_progress = 0, 
> > resident_page_count = 2, backing_object = 0xffffff0053024a90, 
> > backing_object_offset = 163840,
> >  pager_object_list = {tqe_next = 0x0, tqe_prev = 0x0}, cache = 0x0, 
> > handle = 0x0, un_pager = {vnp = {vnp_size = 365278}, devp = {devp_pglist = 
{
> >        tqh_first = 0x592de, tqh_last = 0x0}}, swp = {swp_bcount = 
365278}}}
> > 
> > 
> > >
> > >Another question is what scheduler do you use ?
> > options         SCHED_4BSD              # 4BSD scheduler
> > options         PREEMPTION              # Enable kernel thread preemption
> The state of the both object being destroyed and the object that is shadowed
> looks right for me. Moreover, the shadowed object is not locked, value
> of the mtx_lock is 4. It seems as if the thread missed the wakeup
> owed by pause.
> 
> John, could it be that the following commit is supposed to fix the issue ?
> 
> r179974 | jhb | 2008-06-24 22:36:33 +0300 (Tue, 24 Jun 2008) | 3 lines
> 
> MFC: Change the roundrobin implementation in the 4BSD scheduler to trigger a
> userland preemption directly from hardclock() via sched_clock()

I don't think this would fix the issue.  This patch fixed problems where you 
had a thread pinned to another CPU that held a lock (typically Giant) that a 
callout handler run from softclock needed.  This prevented the 'roundrobin' 
callout from running which would force all the CPUs to do a context switch 
(normally this would have forced the pinned thread holding the lock to 
eventually run).  This involves threads on the run queue not getting to run, 
even though they may have a higher priority than what is running now.

I think this case is still a lingering bug in the sleep queue code since the 
thread lock stuff went in.  There have been several reports of it but I have 
been unable to figure out how the wakeup is being missed.

> > >>>Also, show the output of ps axl <pid>.
> > >> UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME 
COMMAND
> > >>   0 79759 79758   0  96  0     0    16 -      DE+   p6    0:00,00 
> > >>/bin/tcsh -fc 
> > 
>>/meow/ports/editors/openoffice.org-3/work/BEB300_m3/solver/300/unxfbsdx.pro/bin/ma
> > >
> > >It makes sense to show the whole ps axl output.
> > See http://aldan.algebra.com/~mi/tmp/ps-axl.txt -- I edited it for 
> > privacy a little bit, but process-states are intact.
> > The java-processes in the linuxf have remained unkillable for weeks now 
> > -- I even forgot about them. But those are linuxulator problems, whereas 
> > the tcsh is native...
> It seems that pid 63930 is problematic too ?
> 



-- 
John Baldwin


More information about the freebsd-stable mailing list