Possible scheduler (SCHED_ULE) bug?

Fri Oct 23 20:51:18 UTC 2009

On 10/23/09, Jaime Bozza <jbozza at mindsites.com> wrote:
> I believe I found a problem with the ULE scheduler - At least the fact that
> there is a problem, but I'm not sure where to go from here.   The system
> locks all processes, but doesn't panic, so I have no output to give.
>
> I was able to duplicate this on three different machines and solved it by
> switching to the scheduler to 4BSD.
>
> Here's the environment:
>
> FreeBSD 7.2 i386, installed from bootonly ISO, Custom install, minimal, no
> other changes other than setting timezone, changing root password, and
> turning on sshd (allowing root and password connection).
>
> Running portsnap (fetch, then extract) to get latest ports tree.
>
> >From ports, make installs of lang/php5 and www/lighttpd, using defaults for
> all ports installed.
>
> Modified lighttpd.conf for PHP (attached diff), created a short script
> called uploadfile.php (attached).  File was installed at
> /usr/local/www/data/uploadfile.php
>
> Start lighttpd (lighttpd_enable="YES" in rc.conf,
> /usr/local/etc/rc.d/lighttpd start), connect and run script.
>
> As long as I upload a file less than 64K, everything works fine.  If I try
> to upload something larger than 64K, system no longer responds.   Console
> prompt at login will allow me to enter username/password, but nothing
> happens after that.  Console prompt logged in will allow me to type a single
> line, but if I press enter, nothing after that.
>
> No errors get written anywhere - console, logs, etc.
>
> I'm at a loss of what to do next.  Can anyone give me ideas of what else I
> can do?

Superficially, this seams identical to a deadlock I reported for
7.1-RC1. Would you mind compiling a kernel with these options:

options DDB
options KDB
options SW_WATCHDOG
options DEBUG_VFS_LOCKS

then add the following to /etc/rc.conf:

watchdogd_enable="YES"
watchdogd_flags="-e 'ls -al /etc'"

This should force a panic when the lockup happens again, which will
drop to a debugger.

Please check the backtrace, and tell me if the call stack is the same
as this one (between the --- interrupt, and --- syscall sections):

KDB: stack backtrace:
db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...)
at db_trace_self_wrapper+0x26
kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29
hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9
lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c
Xtimerint() at Xtimerint+0x1f
--- interrupt, eip = 0xc07ff29d, esp = 0xe66e0b48, ebp = 0xe66e0c34 ---
kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d
do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at do_sendfile+0xb1
sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13
syscall(e66e0d38) at syscall+0x335
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (393, FreeBSD ELF32, sendfile), eip = 0x282cb0cb, esp =
0xbfbfc7cc, ebp = 0xbfbfe848 ---
KDB: enter: watchdog timeout

You can type 'reboot' to reboot the machine (in my case, panic would
not work, so a useful dump wasn't in the cards)