ULE crash

Wed Jun 25 10:34:30 PDT 2003

On Wed, 25 Jun 2003, Ian Freislich wrote:

> Hi
>
> About 4.5 minutes after rebooting with a SCHED_ULE kernel (I give
> ULE a go every few months), top started looking really wierd (the
> CPU % just kept on accumulating for each process). Before dnetc
> started, httpd showed 17% CPU, but the system was supposedly 100%
> idle at the time according to top.  Then dnetc started and things
> got wierd.

There is some bug that is preventing sleeping processes from loosing old
cpu usage.  I'm looking into that.  Can you tell me what version of the
sched_ule.c file you have?  This looks like an old panic.

Thanks,
Jeff

>
> last pid:   607;  load averages:  1.83,  0.63,  0.25    up 0+00:04:23  16:00:48
> 35 processes:  3 running, 32 sleeping
> CPU states:  0.0% user, 99.0% nice,  0.6% system,  0.4% interrupt,  0.0% idle
> Mem: 20M Active, 14M Inact, 19M Wired, 20K Cache, 25M Buf, 130M Free
> Swap: 512M Total, 512M Free
>
>   PID USERNAME  PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
>   603 ianf      139   20  1072K   880K RUN    0   0:39 105.47% 105.47% dnetc
>   575 ianf      139   20  1072K   880K CPU1   1   1:15 102.34% 102.34% dnetc
>   505 root       76    0  7208K  5420K select 0   0:01 17.97% 17.97% httpd
>   375 root        4    0  1276K   948K accept 0   0:00  9.38%  9.38% nfsd
>   526 nobody     76    0  9280K  8564K select 1   0:04  5.47%  5.47% squid
>   607 ianf       76    0  2196K  1444K CPU0   0   0:00  2.34%  2.34% top
>
> Then it froze.  When I got home I found that it had at least dumped
> vmcore.24.  I'll keep it around for a while and perform any inspections
> people want me to.  This was with sources updated at 13h30 GMT today.
>
> panic: page fault
> panic messages:
> ---
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; lapic.id = 01000000
> fault virtual address   = 0x38
> fault code              = supervisor read, page not present
> instruction pointer     = 0x8:0xc01e094d
> stack pointer           = 0x10:0xce772be4
> frame pointer           = 0x10:0xce772bf4
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 603 (dnetc)
> trap number             = 12
> panic: page fault
> cpuid = 1; lapic.id = 01000000
> Stack backtrace:
> boot() called on cpu#1
>
> syncing disks, buffers remaining... panic: absolutely cannot call smp_ipi_shootdown with interrupts already disabled
> cpuid = 1; lapic.id = 01000000
> boot() called on cpu#1
> Uptime: 4m15s
> Dumping 191 MB
> ata0: resetting devices ..
> done
>  16 32 48 64 80 96 112 128 144 160 176
> ---
>
> (kgdb) bt
> #0  doadump () at ../../../kern/kern_shutdown.c:240
> #1  0xc01cbe7f in boot (howto=260) at ../../../kern/kern_shutdown.c:372
> #2  0xc01cc2b8 in panic () at ../../../kern/kern_shutdown.c:550
> #3  0xc02e8f89 in smp_tlb_shootdown (vector=0, addr1=0, addr2=0)
>     at ../../../i386/i386/mp_machdep.c:2356
> #4  0xc02e92a9 in smp_invlpg_range (addr1=0, addr2=0)
>     at ../../../i386/i386/mp_machdep.c:2488
> #5  0xc02eb548 in pmap_invalidate_range (pmap=0xc03996e0, sva=3365310464,
>     eva=3365314560) at ../../../i386/i386/pmap.c:721
> #6  0xc02eb83d in pmap_qenter (sva=3365310464, m=0xce772884, count=0)
>     at ../../../i386/i386/pmap.c:948
> #7  0xc0218a31 in vm_hold_load_pages (bp=0xc76039a0, from=0, to=3365318656)
>     at ../../../kern/vfs_bio.c:3574
> #8  0xc0216f5a in allocbuf (bp=0xc76039a0, size=8192)
>     at ../../../kern/vfs_bio.c:2752
> #9  0xc0216cee in geteblk (size=8192) at ../../../kern/vfs_bio.c:2634
> #10 0xc0213980 in bwrite (bp=0xc75b65d8) at ../../../kern/vfs_bio.c:818
> #11 0xc02142dc in bawrite (bp=0x0) at ../../../kern/vfs_bio.c:1153
> #12 0xc021d89a in vop_stdfsync (ap=0xce772a14)
>     at ../../../kern/vfs_default.c:742
> #13 0xc0193570 in spec_fsync (ap=0xce772a14)
>     at ../../../fs/specfs/spec_vnops.c:417
> #14 0xc0192a38 in spec_vnoperate (ap=0x0)
>     at ../../../fs/specfs/spec_vnops.c:122
> #15 0xc0294c62 in ffs_sync (mp=0xc3950a00, waitfor=2, cred=0xc0d06e80,
>     td=0xc03702a0) at vnode_if.h:624
> #16 0xc022b15b in sync (td=0xc03702a0, uap=0x0)
>     at ../../../kern/vfs_syscalls.c:142
> #17 0xc01cb9a1 in boot (howto=256) at ../../../kern/kern_shutdown.c:281
> #18 0xc01cc2b8 in panic () at ../../../kern/kern_shutdown.c:550
> #19 0xc02f0da2 in trap_fatal (frame=0xce772ba4, eva=0)
>     at ../../../i386/i386/trap.c:836
> #20 0xc02f0333 in trap (frame=
>       {tf_fs = -1060044776, tf_es = -831062000, tf_ds = -1071775728, tf_edi = -1014422336, tf_esi = -1070107520, tf_ebp = -831050764, tf_isp = -831050800, tf_ebx = 0, tf_edx = 0, tf_ecx = -1059988168, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1071773363, tf_cs = 8, tf_eflags = 66194, tf_esp = -1070107520, tf_ss = 0}) at ../../../i386/i386/trap.c:256
> #21 0xc02d8eb8 in calltrap () at {standard input}:97
> #22 0xc01e188b in sched_choose () at ../../../kern/sched_ule.c:1161
> #23 0xc01d25e6 in choosethread () at ../../../kern/kern_switch.c:140
> #24 0xc01d422f in mi_switch () at ../../../kern/kern_synch.c:525
> #25 0xc01c1db6 in _mtx_lock_sleep (m=0xc0374a40, opts=0, file=0x0, line=0)
>     at ../../../kern/kern_mutex.c:636
> #26 0xc01ca585 in getrusage (td=0x0, uap=0xce772d10)
>     at ../../../kern/kern_resource.c:773
> #27 0xc02f10fc in syscall (frame=
>       {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 135360172, tf_esi = 135336096, tf_ebp = -1077938416, tf_isp = -831050380, tf_ebx = -1077938416, tf_edx = 0, tf_ecx = 0, tf_eax = 117, tf_trapno = 0, tf_err = 2, tf_eip = 134789976, tf_cs = 31, tf_eflags = 659, tf_esp = -1077938572, tf_ss = 47})
>     at ../../../i386/i386/trap.c:1023
> #28 0xc02d8f0d in Xint0x80_syscall () at {standard input}:139
> ---Can't read userspace from dump, or kernel process---
>
>
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
>