(Followup) Management of Thermal

Norberto Meijome freebsd at meijome.net
Fri Oct 26 05:53:21 PDT 2007


Hi everyone,
Apologies to everyone for the long email. I am trying to understand how the
different thermal management + power subsystems work together, because my
laptop is heating up too much under load, and shutting down because it reaches
CRIT and shuts down on me.

Early this month I started a thread in -mobile@ which proved very interesting,
though I didn't find an answer to my issues. The thread starts here, and
continues, providing information over a few days:

http://lists.freebsd.org/pipermail/freebsd-mobile/2007-October/010075.html

The problem I have, in short, is that my Thinkpad Z60m doesn't seem to use
active cooling as I believe it should. Fan speed is usually in the 2,700 RPM to
2900 RPM when the system is not loaded. The values below are from a test i am
running, connected to AC.

My situation has improved somewhat from that of my previous report thanks to using:
hw.pci.do_power_nodriver="2"

as suggested by Richard Arens. BUT, not sure if related to this or not, but system now feels more sluggish than before :(


Oct 26 21:28:34 ayiin tempd[25761]: CPU temp: 66.0C Freq: 1333 Fan 2783 rpm. Load { 1.16 0.89 0.61 }
Oct 26 21:28:54 ayiin tempd[25767]: CPU temp: 64.0C Freq: 1750 Fan 2770 rpm. Load { 0.91 0.85 0.60 }
[...]
Oct 26 21:29:54 ayiin tempd[25796]: CPU temp: 61.0C Freq: 1166 Fan 2777 rpm. Load { 0.41 0.72 0.57 }
Oct 26 21:30:14 ayiin tempd[25806]: CPU temp: 66.0C Freq: 2000 Fan 2779 rpm. Load { 0.49 0.71 0.57 }
[very similar]...
Oct 26 21:34:34 ayiin tempd[25947]: CPU temp: 59.0C Freq: 1400 Fan 2757 rpm. Load { 0.47 0.53 0.51 }
[..]
Oct 26 21:35:14 ayiin tempd[25965]: CPU temp: 58.0C Freq: 1600 Fan 2787 rpm. Load { 0.57 0.54 0.51 }
Oct 26 21:35:34 ayiin tempd[25982]: CPU temp: 59.0C Freq: 1166 Fan 2771 rpm. Load { 0.41 0.51 0.50 }


which seems OK enough, but as soon as I  attempt to push the CPU hard (QEMu,
build ports for > 15 minutes or so, or build world), the temperature reaches
over mid 90s C , the CPU slows down, but the fan doesnt seem to speed up that
much at all (i can barely hear it).
For example, building a small port like vte or screen will work ok. building gcc will overheat.

Some stats:

Oct 26 21:38:40 ayiin sudo:    betom : TTY=ttyp5 ; PWD=/usr/src ; USER=root ; COMMAND=/usr/bin/make buildworld
Oct 26 21:38:54 ayiin tempd[26548]: CPU temp: 66.0C Freq: 2000 Fan 2781 rpm. Load { 0.59 0.70 0.60 }
Oct 26 21:39:14 ayiin tempd[27188]: CPU temp: 69.0C Freq: 2000 Fan 2780 rpm. Load { 0.42 0.66 0.58 }
[...]
Oct 26 21:41:14 ayiin tempd[36558]: CPU temp: 83.0C Freq: 2000 Fan 2766 rpm. Load { 0.81 0.64 0.58 }
Oct 26 21:41:34 ayiin tempd[37245]: CPU temp: 85.0C Freq: 2000 Fan 2771 rpm. Load { 1.39 0.78 0.63 }
[....] slowly creeping up.

Oct 26 21:43:34 ayiin tempd[39153]: CPU temp: 90.0C Freq: 2000 Fan 2743 rpm. Load { 2.07 1.21 0.82 }
Oct 26 21:43:54 ayiin tempd[39215]: CPU temp: 91.0C Freq: 2000 Fan 2756 rpm. Load { 1.91 1.23 0.83 }

Notice how the temperature jumped 20 degrees, but the fan is sitting at exactly
the same rpms as before.... doesn't sound right to me... 

Oct 26 21:44:15 ayiin tempd[39338]: CPU temp: 92.0C Freq: 2000 Fan 3355 rpm. Load { 2.08 1.31 0.87 }
Oct 26 21:44:35 ayiin tempd[39436]: CPU temp: 92.0C Freq: 2000 Fan 3262 rpm. Load { 2.27 1.40 0.91 }
Oct 26 21:44:55 ayiin tempd[39586]: CPU temp: 92.0C Freq: 2000 Fan 3261 rpm. Load { 2.34 1.47 0.95 }
Oct 26 21:45:15 ayiin tempd[39700]: CPU temp: 92.0C Freq: 2000 Fan 3236 rpm. Load { 2.24 1.51 0.97 }

from 92 to 99 the fan will sit on 3200s rpm. the Freq will drop to 1600, etc as
the temperature gets higher 

Oct 26 21:57:55 ayiin tempd[61267]: CPU temp: 85.0C Freq: 2000 Fan 3211 rpm. Load { 3.40 4.34 3.48 }
Oct 26 21:58:15 ayiin tempd[61827]: CPU temp: 92.0C Freq: 2000 Fan 3210 rpm. Load { 3.56 4.29 3.49 }
Oct 26 21:58:35 ayiin tempd[62369]: CPU temp: 93.0C Freq: 2000 Fan 3221 rpm. Load { 3.73 4.28 3.50 }
Oct 26 21:58:55 ayiin tempd[62879]: CPU temp: 95.0C Freq: 932 Fan 3214 rpm. Load { 3.39 4.16 3.48 }
Oct 26 21:59:15 ayiin tempd[63099]: CPU temp: 78.0C Freq: 932 Fan 3212 rpm. Load { 3.52 4.13 3.48 }
Oct 26 21:59:35 ayiin tempd[63590]: CPU temp: 76.0C Freq: 932 Fan 3197 rpm. Load { 3.03 3.98 3.44 }


I've had to set the lowest frequency of the CPU to 932 Mhz, otherwise it would
drop to 100 Mhz and the system would be utterly unusable.

Even @ 100 Mhz the system would heat up to CRIT and shut down (as
explained in my previous thread in mobiles@). Of course, when the CPU drops to
932 or 100 Mhz, the load will skyrocket, so when it jumps back to 2 GHz it is
so busy it will drop down to 932 Mhz sooner than before.... 

I would have hoped that, while connected to AC, the laptop would spare no power and push the fan hard  
to cool down as much as possible before stepping down the cpu is needed.

System : IBM Thinkpad Z60m, 6.2-STABLE FreeBSD 6.2-STABLE #24: Sun Oct  7 18:36:36 EST 2007

Dmesg (top section, relevant only i think):

Oct 26 09:06:59 ayiin kernel: Timecounter "i8254" frequency 1193182 Hz quality 0
Oct 26 09:06:59 ayiin kernel: CPU: Intel(R) Pentium(R) M processor 2.00GHz (1995.02-MHz 686-class CPU)
Oct 26 09:06:59 ayiin kernel: Origin = "GenuineIntel"  Id = 0x6d8  Stepping = 8
Oct 26 09:06:59 ayiin kernel: Features=0xafe9fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,TM,PBE>
Oct 26 09:06:59 ayiin kernel: Features2=0x180<EST,TM2>
Oct 26 09:06:59 ayiin kernel: real memory  = 1609433088 (1534 MB)
Oct 26 09:06:59 ayiin kernel: avail memory = 1567961088 (1495 MB)
Oct 26 09:06:59 ayiin kernel: ACPI APIC Table: <IBM    TP-77   >
Oct 26 09:06:59 ayiin kernel: ioapic0: Changing APIC ID to 1
Oct 26 09:06:59 ayiin kernel: ioapic0 <Version 2.0> irqs 0-23 on motherboard
Oct 26 09:06:59 ayiin kernel: wlan: mac acl policy registered
Oct 26 09:06:59 ayiin kernel: kqemu version 0x00010300
Oct 26 09:06:59 ayiin kernel: kqemu: KQEMU installed, max_locked_mem=781488kB.
Oct 26 09:06:59 ayiin kernel: acpi0: <IBM TP-77> on motherboard
Oct 26 09:06:59 ayiin kernel: acpi_ec0: <Embedded Controller: GPE 0x1c, ECDT> port 0x62,0x66 on acpi0
Oct 26 09:06:59 ayiin kernel: acpi0: Power Button (fixed)
Oct 26 09:06:59 ayiin kernel: Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
Oct 26 09:06:59 ayiin kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
Oct 26 09:06:59 ayiin kernel: cpu0: <ACPI CPU> on acpi0
Oct 26 09:06:59 ayiin kernel: est0: <Enhanced SpeedStep Frequency Control> on cpu0
Oct 26 09:06:59 ayiin kernel: p4tcc0: <CPU Frequency Thermal Control> on cpu0
Oct 26 09:06:59 ayiin kernel: acpi_lid0: <Control Method Lid Switch> on acpi0
Oct 26 09:06:59 ayiin kernel: acpi_button0: <Sleep Button> on acpi0


/etc/rc.conf :
powerd_enable="YES"
powerd_flags="-a adaptive -b adaptive"
## Disable power_profile behaviour...
performance_cx_lowest="NONE"    # Online CPU idle state
performance_cpu_freq="NONE"     # Online CPU frequency
economy_cx_lowest="NONE"        # Offline CPU idle state
economy_cpu_freq="NONE"         # Offline CPU frequency

( I didn't have any of the _cx_* settings when I started having this issue..)

## ACPI Power and Thermal related configs

# Lowest CPU frequency in MHz to offer to users
debug.cpufreq.lowest=932

# Power management of PCI devices
hw.pci.do_power_nodriver=2

# Suggested in -mobile@ to keep temperature lower
hw.acpi.cpu.cx_lowest=C3

### trying to finetune the action of the thermal zones
## man 4 acpi_thermal
## for details
## Custom values
hw.acpi.thermal.user_override=1

hw.acpi.thermal.polling_rate=5


$ kldstat
Id Refs Address    Size     Name
 1   34 0xc0400000 4c3d24   kernel
 2    1 0xc08c4000 836c     linprocfs.ko
 3    3 0xc08cd000 21ebc    linux.ko
 4    1 0xc08ef000 135e0    snd_hda.ko
 5    2 0xc0903000 68e48    sound.ko
 6    2 0xc096c000 17920    agp.ko
 7    2 0xc0984000 666a8    acpi.ko
 8    1 0xc09eb000 4c4c     acpi_ibm.ko
 9    1 0xc09f0000 b668     cpufreq.ko
10    1 0xc09fc000 1d498    kqemu.ko
11    1 0xc0a1a000 22140    radeon.ko
12    2 0xc0a3d000 10c68    drm.ko
13    1 0xc5d91000 c000     ipfw.ko
14    1 0xc5e9a000 7000     aio.ko
15    1 0xc9fc6000 2000     rtc.ko

$ sysctl -a | grep freq
kern.acct_chkfreq: 15
debug.cpufreq.lowest: 932
debug.cpufreq.verbose: 0
machdep.tsc_freq: 1995016700
machdep.i8254_freq: 1193182
machdep.acpi_timer_freq: 3579545
dev.cpu.0.freq: 1750
dev.cpu.0.freq_levels: 2000/27000 1750/23625 1600/22600 1400/19775 1333/19666 1166/17207 1066/16733 932/14641
dev.est.0.freq_settings: 2000/27000 1600/22600 1333/19666 1066/16733 800/13800
dev.cpufreq.0.%driver: cpufreq
dev.cpufreq.0.%parent: cpu0
dev.p4tcc.0.freq_settings: 10000/-1 8750/-1 7500/-1 6250/-1 5000/-1 3750/-1 2500/-1 1250/-1


$ sysctl dev.cpu
dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=\_PR_.CPU_
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%parent: acpi0
dev.cpu.0.freq: 1750
dev.cpu.0.freq_levels: 2000/27000 1750/23625 1600/22600 1400/19775 1333/19666 1166/17207 1066/16733 932/14641
dev.cpu.0.cx_supported: C1/1 C2/1 C3/85
dev.cpu.0.cx_lowest: C3
dev.cpu.0.cx_usage: 0.00% 99.11% 0.88%

$ sysctl dev.p4tcc
dev.p4tcc.0.%desc: CPU Frequency Thermal Control
dev.p4tcc.0.%driver: p4tcc
dev.p4tcc.0.%parent: cpu0
dev.p4tcc.0.freq_settings: 10000/-1 8750/-1 7500/-1 6250/-1 5000/-1 3750/-1 2500/-1 1250/-1

$ sysctl dev.acpi_ibm
dev.acpi_ibm.0.%desc: IBM ThinkPad ACPI Extras
dev.acpi_ibm.0.%driver: acpi_ibm
dev.acpi_ibm.0.%location: handle=\_SB_.PCI0.LPC_.EC__.HKEY
dev.acpi_ibm.0.%pnpinfo: _HID=IBM0068 _UID=0
dev.acpi_ibm.0.%parent: acpi0
dev.acpi_ibm.0.initialmask: 2060
dev.acpi_ibm.0.availmask: 16777215
dev.acpi_ibm.0.events: 1
dev.acpi_ibm.0.eventmask: 16777215
dev.acpi_ibm.0.hotkey: 1334
dev.acpi_ibm.0.lcd_brightness: 0
dev.acpi_ibm.0.volume: 3
dev.acpi_ibm.0.mute: 0
dev.acpi_ibm.0.thinklight: 0
dev.acpi_ibm.0.bluetooth: 1
dev.acpi_ibm.0.wlan: 1
dev.acpi_ibm.0.fan_speed: 2753
dev.acpi_ibm.0.fan: 1
dev.acpi_ibm.0.thermal: 72 62 40 79 50 -1 36 -1

$ sysctl hw.acpi
hw.acpi.supported_sleep_state: S3 S4 S5
hw.acpi.power_button_state: S5
hw.acpi.sleep_button_state: S3
hw.acpi.lid_switch_state: NONE
hw.acpi.standby_state: S1
hw.acpi.suspend_state: S3
hw.acpi.sleep_delay: 1
hw.acpi.s4bios: 0
hw.acpi.verbose: 0
hw.acpi.disable_on_reboot: 0
hw.acpi.handle_reboot: 0
hw.acpi.reset_video: 1
hw.acpi.cpu.cx_lowest: C3
hw.acpi.thermal.min_runtime: 0
hw.acpi.thermal.polling_rate: 5
hw.acpi.thermal.user_override: 1
hw.acpi.thermal.tz0.temperature: 74.0C
hw.acpi.thermal.tz0.active: -1
hw.acpi.thermal.tz0.passive_cooling: 1
hw.acpi.thermal.tz0.thermal_flags: 0
hw.acpi.thermal.tz0._PSV: 94.5C
hw.acpi.thermal.tz0._HOT: -1
hw.acpi.thermal.tz0._CRT: 99.0C
hw.acpi.thermal.tz0._ACx: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
hw.acpi.battery.life: 99
hw.acpi.battery.time: -1
hw.acpi.battery.state: 0
hw.acpi.battery.units: 1
hw.acpi.battery.info_expire: 5
hw.acpi.acline: 1


My CPU setting in BIOS is set to performance, i think, when plugged in.I tested with adaptive and didnt make a difference.

Any advice / RTFM to understand how all the subsystems work together, and the suggested way to manage this will be greatly appreciated. I am happy to compile the information and submit a document for the documentation project.

cheers,
Beto
_________________________
{Beto|Norberto|Numard} Meijome

"Software is like sex, its better when its free"
   Linus Torvalds

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


More information about the freebsd-acpi mailing list