kern/119530: Kqueue/Kevent causes fatal trap 12
Sebastien Petit
sebastien.petit at kewego.com
Thu Jan 10 05:50:02 PST 2008
>Number: 119530
>Category: kern
>Synopsis: Kqueue/Kevent causes fatal trap 12
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Thu Jan 10 13:50:01 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator: Sebastien Petit
>Release: FreeBSD-6.2-RELEASE, FreeBSD-6.2-STABLE, FreeBSD-6.3-PRERELEASE
>Organization:
Kewego
>Environment:
FreeBSD proxy0.XXXXXXXXXXXX 6.3-PRERELEASE FreeBSD 6.3-PRERELEASE #0: Thu Jan 10 00:13:27 CET 2008 root at build0.XXXXXXXXXXXX:/usr/src-6.2-STABLE/sys/i386/compile/PE2950-i386 i386
>Description:
There is probably a race condition with kqueue and expire of a EVFILT_TIMER event set with EV_ONESHOT flag.
In some cases, the kernel crash with a supervisor read error on callout_reset(), probably a race condition because the first argument is NULL, and should not be (struct callout* is NULL)
Application that cause this bug create a lot of EVFILT_TIMER events (about 300-400) with 300 seconds of timeout. when EVFILT_TIMER expire, a new is created with 300 seconds of timeout.
This application cause the crash detailed below:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x18
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc051b4bc
stack pointer = 0x28:0xe6ea5c68
frame pointer = 0x28:0xe6ea5c78
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = resume, IOPL = 0
current process = 14 (swi4: clock)
[thread pid 14 tid 100002 ]
Stopped at callout_reset+0xc4: testb $0x4,0x18(%esi)
db> where
Tracing pid 14 tid 100002 td 0xc8326c00
callout_reset(0,1,c04ed97c,c85cf4c8) at callout_reset+0xc4
filt_timerexpire(c85cf4c8) at filt_timerexpire+0xa8
softclock(0) at softclock+0x2eb
ithread_execute_handlers(c8325430,c8375b80) at ithread_execute_handlers+0x125
ithread_loop(c83068c0,e6ea5d38) at ithread_loop+0x55
fork_exit(c04f6cb8,c83068c0,e6ea5d38) at fork_exit+0x71
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe6ea5d6c, ebp = 0 ---
db>
Ugly patch is attached (test of NULL pointers before calling callout_reset and print a kernel error if a NULL is detected). It must work but a good patch must be created to avoid that.
>How-To-Repeat:
Run an application that do a lot of EVFILT_TIMER with EV_ONESHOT flag and read the same kqueue with multiple threads. libthr is used.
Seem to appear on SMP servers only
Kqueue/Kevent is not thread safe ?
>Fix:
Patch on /usr/src/sys/kern/kern_event.c (filt_timerexpire function) to see what is happening and avoid the call of callout_reset with a NULL struct callout* that cause the fatal trap.
static void
filt_timerexpire(void *knx)
{
struct knote *kn = knx;
struct callout *calloutp;
+ if (! knx) {
+ printf("knx is NULL. cannot expire the timer\n");
+ return;
+ }
kn->kn_data++;
KNOTE_ACTIVATE(kn, 0); /* XXX - handle locking */
if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) {
calloutp = (struct callout *)kn->kn_hook;
+ if (calloutp)
+ callout_reset(calloutp, timertoticks(kn->kn_sdata),
+ filt_timerexpire, kn);
+ else
+ printf("warning: calloutp is already freed, aborting\n");
}
}
I don't know if this patch correct the problem completly, I have patched my systems and see if the race condition happen again.
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list