cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)

Tue Aug 12 15:15:28 UTC 2008

On Tuesday 12 August 2008 04:36:29 am pluknet wrote:
> 2008/8/11 John Baldwin <jhb at freebsd.org>:
> > On Monday 11 August 2008 12:35:17 pm pluknet wrote:
> >> 2008/8/11 John Baldwin <jhb at freebsd.org>:
> >> > On Saturday 09 August 2008 07:16:37 am Ulrich Spoerlein wrote:
> >> >> Hi John,
> >> >>
> >> >> I now figured out the "who", the "why" still eludes me.
> >> >>
> >> >> So, after your MFC of ichss.c on June 27th the device now attaches at 
my
> >> >> laptop. It didn't before, so it could cause no trouble.
> >> >>
> >> >> With ichss loaded, the kernel will panic 1-3 minutes after powerd has
> >> >> been started (if I kill powerd early enough, it seems pretty stable).
> >> >>
> >> >> I'm now running a kernel from 2008-08-08 with
> >> >> hint.ichss.0.disabled="1"
> >> >
> >> > Ok.  Can you get a crashdump from a crash?
> >> >
> >>
> >> ehm,. I am not Ulrich Spoerlein, but I can help with this issue.
> >>
> >> my crashdump from kgdb and some debug info.
> >> (ouch, I forgot to include it in my prev. mail
> >> 
http://lists.freebsd.org/pipermail/freebsd-stable/2008-August/044182.html )
> >>
> >> wbr,
> >> pluknet
> >>
> >> Unread portion of the kernel message buffer:
> >>
> >>
> >> Fatal trap 12: page fault while in kernel mode
> >> fault virtual address   = 0x38
> >> fault code              = supervisor read, page not present
> >> instruction pointer     = 0x20:0xc056cf46
> >> stack pointer           = 0x28:0xe6592ac8
> >> frame pointer           = 0x28:0xe6592ac8
> >> code segment            = base 0x0, limit 0xfffff, type 0x1b
> >>                         = DPL 0, pres 1, def32 1, gran 1
> >> processor eflags        = interrupt enabled, resume, IOPL = 0
> >> current process         = 2507 (powerd)
> >> Physical memory: 1014 MB
> >> Dumping 120 MB: 105 89 73 57 41 25 9
> >>
> >> #0  doadump () at pcpu.h:195
> >> 195     pcpu.h: No such file or directory.
> >>         in pcpu.h
> >> (kgdb) bt
> >> #0  doadump () at pcpu.h:195
> >> #1  0xc0458f89 in db_fncall (dummy1=-1010027648, dummy2=0, dummy3=0,
> >>     dummy4=0xe6592860 "0╛йц") at /media/src-7/sys/ddb/db_command.c:516
> >> #2  0xc045953a in db_command (last_cmdp=0xc07dcf14, cmd_table=0x0,
> > dopager=1)
> >>     at /media/src-7/sys/ddb/db_command.c:413
> >> #3  0xc0459655 in db_command_loop ()
> > at /media/src-7/sys/ddb/db_command.c:466
> >> #4  0xc045b17c in db_trap (type=12, code=0)
> >>     at /media/src-7/sys/ddb/db_main.c:228
> >> #5  0xc0575023 in kdb_trap (type=12, code=0, tf=0xe6592a88)
> >>     at /media/src-7/sys/kern/subr_kdb.c:524
> >> #6  0xc07460bf in trap_fatal (frame=0xe6592a88, eva=56)
> >>     at /media/src-7/sys/i386/i386/trap.c:890
> >> #7  0xc074636b in trap_pfault (frame=0xe6592a88, usermode=0, eva=56)
> >>     at /media/src-7/sys/i386/i386/trap.c:812
> >> #8  0xc0746d36 in trap (frame=0xe6592a88)
> >>     at /media/src-7/sys/i386/i386/trap.c:490
> >> #9  0xc072fd4b in calltrap () 
at /media/src-7/sys/i386/i386/exception.s:139
> >> #10 0xc056cf46 in device_is_attached (dev=0x0)
> >>     at /media/src-7/sys/kern/subr_bus.c:2228
> >> #11 0xc0512de7 in cf_set_method (dev=0xc3c9c880, level=0xc4525ef4,
> >>     priority=100) at /media/src-7/sys/kern/kern_cpu.c:332
> >> #12 0xc0511452 in cpufreq_curr_sysctl (oidp=0xc3c8bac0, arg1=0xc3bc7c00,
> >>     arg2=0, req=0xe6592ba4) at cpufreq_if.h:32
> >> ---Type <return> to continue, or q <return> to quit---
> >> #13 0xc0554b67 in sysctl_root (oidp=Variable "oidp" is not available.
> >> )
> >>     at /media/src-7/sys/kern/kern_sysctl.c:1306
> >> #14 0xc0554cd1 in userland_sysctl (td=0xc4245440, name=0xe6592c14,
> > namelen=4,
> >>     old=0x0, oldlenp=0x0, inkernel=0, new=0xbfbfe7c4, newlen=4,
> >>     retval=0xe6592c10, flags=0) 
at /media/src-7/sys/kern/kern_sysctl.c:1401
> >> #15 0xc0555a7c in __sysctl (td=0xc4245440, uap=0xe6592cfc)
> >>     at /media/src-7/sys/kern/kern_sysctl.c:1336
> >> #16 0xc07466d5 in syscall (frame=0xe6592d38)
> >>     at /media/src-7/sys/i386/i386/trap.c:1035
> >> #17 0xc072fdb0 in Xint0x80_syscall ()
> >>     at /media/src-7/sys/i386/i386/exception.s:196
> >> #18 0x00000033 in ?? ()
> >> Previous frame inner to this frame (corrupt stack?)
> >> (kgdb) f 11
> >> #11 0xc0512de7 in cf_set_method (dev=0xc3c9c880, level=0xc4525ef4,
> >>     priority=100) at /media/src-7/sys/kern/kern_cpu.c:332
> >> 332                     if (!device_is_attached(set->dev)) {
> >> (kgdb) list
> >> 327             }
> >> 328
> >> 329             /* Next, set any/all relative frequencies via their 
drivers.
> > */
> >> 330             for (i = 0; i < level->rel_count; i++) {
> >> 331                     set = &level->rel_set[i];
> >> 332                     if (!device_is_attached(set->dev)) {
> >> 333                             error = ENXIO;
> >> 334                             goto out;
> >> 335                     }
> >> 336
> >> (kgdb) p level.rel_count
> >> $1 = 1986356271
> >> (kgdb) p i
> >> $2 = 0
> >> (kgdb) p level.rel_set
> >> $3 = {{freq = 0, volts = 0, power = 0, lat = 0, dev = 0x0, spec = {0, 0, 
0,
> >>       0}}, {freq = 0, volts = 0, power = 0, lat = 0, dev = 0x0, spec = 
{0,
> > 0,
> >>       0, 0}}, {freq = 0, volts = 0, power = 0, lat = 0, dev = 0x0, spec =
> > {0,
> >>       0, 0, 0}}, {freq = 0, volts = 0, power = 0, lat = 0, dev = 0x0, 
spec =
> > {
> >>       0, 0, 0, 0}}, {freq = 0, volts = 0, power = 0, lat = 0, dev = 0x0,
> >>     spec = {0, 0, 0, 0}}, {freq = 0, volts = 0, power = 0, lat = 0,
> >>     dev = 0x656e7552, spec = {828858701, 1162760014, 0, 134632492}}, {
> >>     freq = 0, volts = 53, power = -1024, lat = -1, dev = 0x7fffffff, spec 
=
> > {
> >> ----- and so on-----
> >>
> >> Also there are very unusual (and high) numbers in sysctl
> > dev.cpu.0.freq_levels.
> >
> > Which cpufreq drivers are you using?  Can you narrow down your panics (and
> > weird frequencies in the sysctl) to being caused by a specific cpufreq
> > driver?
> 
> They are est/p4tcc/ichss.
> hint.ichss.0.disbled="1" helped me to avoid panics and those weired
> freqs in dev.cpu.
> Now it shows:
> cpu0: <ACPI CPU> on acpi0
> est0: <Enhanced SpeedStep Frequency Control> on cpu0
> p4tcc0: <CPU Frequency Thermal Control> on cpu0
> and dev.cpu.0.freq_levels are as expected (and as it was earlier
> before this problem):
>  1600/-1 1400/-1 1225/-1 1200/-1 1050/-1 1000/-1 875/-1 800/-1 700/-1
> 600/-1 525/-1 450/-1 375/-1 300/-1 225/-1 150/-1 75/-1

Try http://www.freebsd.org/~jhb/patches/cpufreq_order.patch  ichss0 is only 
supposed to be used if you don't have est0, and this patch fixes that.  I'm 
curious if you get panics if you have disable est0 and leave ichss0 enabled 
or if that works ok?

-- 
John Baldwin