kern/51964: panic: race condition with realitexpire() called for zombie

Jilles Tjoelker jilles at stack.nl
Thu May 8 04:00:35 PDT 2003


>Number:         51964
>Category:       kern
>Synopsis:       panic: race condition with realitexpire() called for zombie
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu May 08 04:00:29 PDT 2003
>Closed-Date:
>Last-Modified:
>Originator:     Jilles Tjoelker
>Release:        FreeBSD 5.0-CURRENT i386
>Organization:
MCGV Stack
>Environment:
System: FreeBSD turtle.stack.nl 5.0-CURRENT FreeBSD 5.0-CURRENT #9: Wed Mar 26 16:42:07 CET 2003 jilles at turtle.stack.nl:/usr/obj/usr/src/sys/TURTLE i386

Dual Athlon XP 2000+
1 GB RAM

XFree86-Server-4.2.1_7

The test program was also tried on a uniprocessor machine running 4.6-RELEASE,
which did not crash. I have not been able to test it on a UP 5-CURRENT box.

Using SCHED_4BSD.

>Description:
When shutting down the X server, the machine sometimes panics. This has been
happening for about 2.5 months.

I have two traces: one with the X server (started via Xwrapper-4), and one with
the test program (called itimer).

There were similar panics earlier, but as there was no kernel.debug available,
little analysis was possible. A trap occured in realitexpire().

The function realitexpire() is called for a zombie process. Also, the fact that
p==NULL seems strange.

Script started on Thu Apr  3 12:54:18 2003
# gdb -k kernel.debug.3 vmcore.3
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-undermydesk-freebsd"...
panic: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic.id = 01000000
fault virtual address	= 0x0
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xc01cfdb8
stack pointer	        = 0x10:0xdf0e5c24
frame pointer	        = 0x10:0xdf0e5c58
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 14 (swi7: tty:sio clock)
trap number		= 12
panic: page fault
cpuid = 0; lapic.id = 01000000
boot() called on cpu#0

syncing disks, buffers remaining... 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 7128 
giving up on 223 buffers
Uptime: 5d21h3m49s
Dumping 1023 MB
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]  672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008
---
Reading symbols from /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/acpi/acpi.ko.debug...done.
Loaded symbols for /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/acpi/acpi.ko.debug
Reading symbols from /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/linux/linux.ko.debug...done.
Loaded symbols for /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/linux/linux.ko.debug
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:239
239		dumping++;
(kgdb) bt
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:239
#1  0xc01ccdb1 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xc01cd137 in panic () at /usr/src/sys/kern/kern_shutdown.c:542
#3  0xc0334ae2 in trap_fatal (frame=0xdf0e5be4, eva=0)
    at /usr/src/sys/i386/i386/trap.c:843
#4  0xc0334792 in trap_pfault (frame=0xdf0e5be4, usermode=0, eva=0)
    at /usr/src/sys/i386/i386/trap.c:757
#5  0xc0334282 in trap (frame=
      {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -899772928, tf_esi = 1, tf_ebp = -552706984, tf_isp = -552707056, tf_ebx = 0, tf_edx = 0, tf_ecx = 8192, tf_eax = 8192, tf_trapno = 12, tf_err = 0, tf_eip = -1071841864, tf_cs = 8, tf_eflags = 66070, tf_esp = -899772640, tf_ss = 134217742})
    at /usr/src/sys/i386/i386/trap.c:444
#6  0xc031c808 in calltrap () at {standard input}:97
#7  0xc01dc0ec in realitexpire (arg=0xca5e8e00)
    at /usr/src/sys/kern/kern_time.c:530
#8  0xc01dc845 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:195
#9  0xc01b88b4 in ithread_loop (arg=0xc21d2180)
    at /usr/src/sys/kern/kern_intr.c:536
#10 0xc01b7743 in fork_exit (callout=0xc01b86e0 <ithread_loop>, arg=0x0, 
    frame=0x0) at /usr/src/sys/kern/kern_fork.c:791
(kgdb) frame 7
#7  0xc01dc0ec in realitexpire (arg=0xca5e8e00)
    at /usr/src/sys/kern/kern_time.c:530
530		psignal(p, SIGALRM);
(kgdb) print p
$1 = (struct proc *) 0x0
(kgdb) print *(struct proc *)arg
$2 = {p_list = {le_next = 0xc6d73800, le_prev = 0xc0392f58}, p_ksegrps = {
    tqh_first = 0xc7ed9180, tqh_last = 0xc7ed9184}, p_threads = {
    tqh_first = 0xc7e25780, tqh_last = 0xc7e25788}, p_suspended = {
    tqh_first = 0x0, tqh_last = 0xca5e8e18}, p_ucred = 0x0, p_fd = 0x0, 
  p_stats = 0xe5330a80, p_limit = 0x0, p_upages_obj = 0xca5e5390, 
  p_procsig = 0x0, p_flag = 24834, p_sflag = 1, p_state = PRS_ZOMBIE, 
  p_pid = 89957, p_hash = {le_next = 0x0, le_prev = 0xc21c8d94}, p_pglist = {
    le_next = 0x0, le_prev = 0xcaade008}, p_pptr = 0xca5e9c00, p_sibling = {
    le_next = 0x0, le_prev = 0xca5e9c64}, p_children = {lh_first = 0x0}, 
  p_mtx = {mtx_object = {lo_class = 0xc0392840, 
      lo_name = 0xc0367102 "process lock", 
      lo_type = 0xc0367102 "process lock", lo_flags = 4325376, lo_list = {
        tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, 
    mtx_lock = 3256699952, mtx_recurse = 0, mtx_blocked = {tqh_first = 0x0, 
      tqh_last = 0xca5e8e8c}, mtx_contested = {le_next = 0x0, 
      le_prev = 0xc7e257e8}}, p_oppid = 0, p_vmspace = 0x0, p_swtime = 117, 
  p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 20000}, it_value = {
      tv_sec = 507813, tv_usec = 320975}}, p_runtime = {sec = 3, 
    frac = 14595943908044818724}, p_uu = 3255398, p_su = 539957, p_iu = 1, 
  p_uticks = 416, p_sticks = 69, p_iticks = 0, p_profthreads = 0, 
  p_maxthrwaits = 0, p_traceflag = 0, p_tracevp = 0x0, p_tracecred = 0x0, 
  p_siglist = {__bits = {0, 0, 0, 0}}, p_textvp = 0x0, p_lock = 0 '\0', 
  p_klist = {slh_first = 0x0}, p_sigiolst = {slh_first = 0x0}, 
---Type <return> to continue, or q <return> to quit---
  p_sigparent = 20, p_oldsigmask = {__bits = {0, 0, 0, 0}}, p_sig = 0, 
  p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', 
  p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, 
  p_sigmask = {__bits = {0, 0, 0, 0}}, p_sigstk = {ss_sp = 0x0, ss_size = 0, 
    ss_flags = 4}, p_magic = 3203398350, 
  p_comm = "XFree86\0-4\0\0\0\0\0\0\0\0\0", p_pgrp = 0x0, 
  p_sysent = 0xc03ad5e0, p_args = 0x0, p_cpulimit = 900, p_xstat = 0, 
  p_numthreads = 1, p_numksegrps = 1, p_md = {md_ldt = 0x0}, p_itcallout = {
    c_links = {sle = {sle_next = 0xca5ea834}, tqe = {tqe_next = 0xca5ea834, 
        tqe_prev = 0xd288e680}}, c_time = 50769900, c_arg = 0xca5e8e00, 
    c_func = 0, c_flags = 2}, p_uarea = 0xe5330000, p_acflag = 0, p_ru = 0x0, 
  p_peers = 0x0, p_leader = 0xca5e8e00, p_emuldata = 0x0, p_label = {
    l_flags = 0, l_perpolicy = {{l_ptr = 0x0, l_long = 0}, {l_ptr = 0x0, 
        l_long = 0}, {l_ptr = 0x0, l_long = 0}, {l_ptr = 0x0, l_long = 0}}}, 
  p_sched = 0xca5e9000}
(kgdb) # 
# gdb -k kernel.debug.4 vmcore.4
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-undermydesk-freebsd"...
panic: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
cpuid = 1; lapic.id = 00000000
fault virtual address	= 0x0
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xc01cfdb8
stack pointer	        = 0x10:0xdf0e5c24
frame pointer	        = 0x10:0xdf0e5c58
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 14 (swi7: tty:sio clock)
trap number		= 12
panic: page fault
cpuid = 1; lapic.id = 00000000
boot() called on cpu#1

syncing disks, buffers remaining... 6255 6255 6253 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 6252 
giving up on 1878 buffers
Uptime: 1h25m40s
Dumping 1023 MB
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008
---
Reading symbols from /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/acpi/acpi.ko.debug...done.
Loaded symbols for /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/acpi/acpi.ko.debug
Reading symbols from /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/linux/linux.ko.debug...done.
Loaded symbols for /usr/obj/usr/src/sys/TURTLE/modules/usr/src/sys/modules/linux/linux.ko.debug
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:239
239		dumping++;
(kgdb) bt
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:239
#1  0xc01ccdb1 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xc01cd137 in panic () at /usr/src/sys/kern/kern_shutdown.c:542
#3  0xc0334ae2 in trap_fatal (frame=0xdf0e5be4, eva=0)
    at /usr/src/sys/i386/i386/trap.c:843
#4  0xc0334792 in trap_pfault (frame=0xdf0e5be4, usermode=0, eva=0)
    at /usr/src/sys/i386/i386/trap.c:757
#5  0xc0334282 in trap (frame=
      {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -955814912, tf_esi = 1, tf_ebp = -552706984, tf_isp = -552707056, tf_ebx = 0, tf_edx = 0, tf_ecx = 8192, tf_eax = 8192, tf_trapno = 12, tf_err = 0, tf_eip = -1071841864, tf_cs = 8, tf_eflags = 66070, tf_esp = -955814624, tf_ss = 134217742})
    at /usr/src/sys/i386/i386/trap.c:444
#6  0xc031c808 in calltrap () at {standard input}:97
#7  0xc01dc0ec in realitexpire (arg=0xc7076c00)
    at /usr/src/sys/kern/kern_time.c:530
#8  0xc01dc845 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:195
#9  0xc01b88b4 in ithread_loop (arg=0xc21d2180)
    at /usr/src/sys/kern/kern_intr.c:536
#10 0xc01b7743 in fork_exit (callout=0xc01b86e0 <ithread_loop>, arg=0x0, 
    frame=0x0) at /usr/src/sys/kern/kern_fork.c:791
(kgdb) frame 7
#7  0xc01dc0ec in realitexpire (arg=0xc7076c00)
    at /usr/src/sys/kern/kern_time.c:530
530		psignal(p, SIGALRM);
(kgdb) print p
$1 = (struct proc *) 0x0
(kgdb) print *(struct proc *)arg
$2 = {p_list = {le_next = 0xc713ac00, le_prev = 0xc7101400}, p_ksegrps = {
    tqh_first = 0xc7074d80, tqh_last = 0xc7074d84}, p_threads = {
    tqh_first = 0xc70713c0, tqh_last = 0xc70713c8}, p_suspended = {
    tqh_first = 0x0, tqh_last = 0xc7076c18}, p_ucred = 0x0, p_fd = 0x0, 
  p_stats = 0xe516fa80, p_limit = 0x0, p_upages_obj = 0xc6dd3ab0, 
  p_procsig = 0x0, p_flag = 8194, p_sflag = 1, p_state = PRS_ZOMBIE, 
  p_pid = 8772, p_hash = {le_next = 0x0, le_prev = 0xc21c8910}, p_pglist = {
    le_next = 0xc708f400, le_prev = 0xc712c050}, p_pptr = 0xc712c000, 
  p_sibling = {le_next = 0xc708f400, le_prev = 0xc712c064}, p_children = {
    lh_first = 0x0}, p_mtx = {mtx_object = {lo_class = 0xc0392840, 
      lo_name = 0xc0367102 "process lock", 
      lo_type = 0xc0367102 "process lock", lo_flags = 4325376, lo_list = {
        tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, 
    mtx_lock = 3256699952, mtx_recurse = 0, mtx_blocked = {tqh_first = 0x0, 
      tqh_last = 0xc7076c8c}, mtx_contested = {le_next = 0x0, le_prev = 0x0}}, 
  p_oppid = 0, p_vmspace = 0x0, p_swtime = 0, p_realtimer = {it_interval = {
      tv_sec = 0, tv_usec = 10000}, it_value = {tv_sec = 5125, 
      tv_usec = 942046}}, p_runtime = {sec = 0, frac = 10478957466101902}, 
  p_uu = 0, p_su = 753, p_iu = 0, p_uticks = 0, p_sticks = 1, p_iticks = 0, 
  p_profthreads = 0, p_maxthrwaits = 0, p_traceflag = 0, p_tracevp = 0x0, 
  p_tracecred = 0x0, p_siglist = {__bits = {0, 0, 0, 0}}, p_textvp = 0x0, 
  p_lock = 0 '\0', p_klist = {slh_first = 0x0}, p_sigiolst = {
    slh_first = 0x0}, p_sigparent = 20, p_oldsigmask = {__bits = {0, 0, 0, 
---Type <return> to continue, or q <return> to quit---
      0}}, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', 
  p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, 
  p_suspcount = 0, p_sigmask = {__bits = {0, 0, 0, 0}}, p_sigstk = {
    ss_sp = 0x0, ss_size = 0, ss_flags = 4}, p_magic = 3203398350, 
  p_comm = "itimer", '\0' <repeats 13 times>, p_pgrp = 0x0, 
  p_sysent = 0xc03ad5e0, p_args = 0x0, p_cpulimit = 900, p_xstat = 0, 
  p_numthreads = 1, p_numksegrps = 1, p_md = {md_ldt = 0x0}, p_itcallout = {
    c_links = {sle = {sle_next = 0xd28760d8}, tqe = {tqe_next = 0xd28760d8, 
        tqe_prev = 0xd289f328}}, c_time = 512385, c_arg = 0xc7076c00, 
    c_func = 0, c_flags = 2}, p_uarea = 0xe516f000, p_acflag = 1, p_ru = 0x0, 
  p_peers = 0x0, p_leader = 0xc7076c00, p_emuldata = 0x0, p_label = {
    l_flags = 0, l_perpolicy = {{l_ptr = 0x0, l_long = 0}, {l_ptr = 0x0, 
        l_long = 0}, {l_ptr = 0x0, l_long = 0}, {l_ptr = 0x0, l_long = 0}}}, 
  p_sched = 0xc7076e00}
(kgdb) # 
Script done on Thu Apr  3 13:24:03 2003

>How-To-Repeat:
The problem originally occured when the X server was shut down (not always, and
it seemed to depend on the way it was started). The program "itimer.c" (by
Serge van den Boom <svdb at stack.nl>) reproduces the panic on our system. It may
be necessary to change the delays somewhat. It will start 500 child processes.
Also, it may take 20 seconds or more before the panic occurs.

>Fix:
Probably revision 1.97 of src/sys/kern/kern_time.c introduced the problem as it
changed the locking for setitimer(), realitexpire(), etc.  kern_exit.c may be
related as well (exit1() stops the callout, perhaps it was already called in
another thread).

The bug may have been fixed already, but looking briefly in CVSweb, it does not
seem to be. The sources were cvsup'ed on March 26, just before compilation of
world and kernel (nightly cvsup was right in a commit of some ATM stuff and
didn't compile).

--- itimer.c begins here ---
#include <sys/time.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <signal.h>

int
main(int argc, char *argv[]) {
	struct itimerval tv =
	{
		{0, 1},
		{0, 1}
	};
	pid_t pid;
	int i;
	int waitval;
	
	while(1) {
	 	i = 500;
 		while (i--) {
 			pid = fork();
 			switch (pid) {
 				case -1:
 					perror("fork failed");
 					exit(1);
 				case 0: // child
 					signal(SIGALRM, SIG_IGN);
 					setitimer(ITIMER_REAL, &tv, NULL);
 					usleep(100);
 					return 0;
 				default: // parent
 					break;
 			}
 		}
 		
 		usleep(20);
 		i = 500;
 		while(i--)
 			wait(&waitval);
 	}
	return 0;
}
--- itimer.c ends here ---
>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list