Christian Walther
2009-Aug-12 07:47 UTC
Cpufreq/ACPI problem? (basically still is: "Re: Problem with IBM Thinkpad T30 shutting down due to high temperatures")
Hi, thank you for all your feedback. I won't answer all replies in detail, but will summarise what I did to give you some sort of report. Doug made me think about the beginning of this situation. I can't tell you for sure that I had the T30 working flawlessly, because I took the original install from another, older thinkpad. But I did change some BIOS settings, Interrupt settings, mainly, that seem to cause problems with my Wireless NIC in the past. So I restored the BIOS defaults. This seems to make the problem disappear, but to be honest: I'm not sure if I messed up the ACPI table at all, or if this is some sort of performance issue, because I now have all IO bound devices on IRQ 11: vgapci0: <VGA-compatible display> port 0x3000-0x30ff mem 0xe8000000-0xefffffff,0xd0100000-0xd010ffff irq 11 at device 0.0 on pci1 uhci0: <Intel 82801CA/CAM (ICH3) USB controller USB-A> port 0x1800-0x181f irq 11 at device 29.0 on pci0 uhci1: <Intel 82801CA/CAM (ICH3) USB controller USB-B> port 0x1820-0x183f irq 11 at device 29.1 on pci0 uhci2: <Intel 82801CA/CAM (ICH3) USB controller USB-C> port 0x1840-0x185f irq 11 at device 29.2 on pci0 cbb0: <TI1520 PCI-CardBus Bridge> mem 0x50000000-0x50000fff irq 11 at device 0.0 on pci2 cbb1: <TI1520 PCI-CardBus Bridge> mem 0x51000000-0x51000fff irq 11 at device 0.1 on pci2 fxp0: <Intel 82801CAM (ICH3) Pro/100 VE Ethernet> port 0x8000-0x803f mem 0xd0200000-0xd0200fff irq 11 at device 8.0 on pci2 pcm0: <Intel ICH3 (82801CA)> port 0x1c00-0x1cff,0x18c0-0x18ff irq 11 at device 31.5 on pci0 This causes screen refresh problems (e.g. urxvt isn't able to draw new lines as expected). Still, this didn't resolve the issue, so I took a look at acpi_thermal. Right now I have the following set in /etc/sysctl.conf hw.acpi.thermal.user_override=1 hw.acpi.thermal.tz0._PSV=84.0C hw.acpi.thermal.polling_rate=2 This basically gives me: # sysctl -a|egrep "(temp|freq|acpi.therm|acpi_ibm.*fan)" kern.acct_chkfreq: 15 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.TSC.frequency: 2000000000 net.inet.sctp.sack_freq: 2 net.inet6.ip6.use_tempaddr: 0 net.inet6.ip6.temppltime: 86400 net.inet6.ip6.tempvltime: 604800 net.inet6.ip6.prefer_tempaddr: 0 debug.cpufreq.verbose: 0 debug.cpufreq.lowest: 0 hw.acpi.thermal.min_runtime: 0 hw.acpi.thermal.polling_rate: 2 hw.acpi.thermal.user_override: 1 hw.acpi.thermal.tz0.temperature: 62.0C hw.acpi.thermal.tz0.active: 0 hw.acpi.thermal.tz0.passive_cooling: 1 hw.acpi.thermal.tz0.thermal_flags: 0 hw.acpi.thermal.tz0._PSV: 84.0C hw.acpi.thermal.tz0._HOT: -1 hw.acpi.thermal.tz0._CRT: 92.0C hw.acpi.thermal.tz0._ACx: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 hw.acpi.thermal.tz0._TC1: 5 hw.acpi.thermal.tz0._TC2: 3 hw.acpi.thermal.tz0._TSP: 600 machdep.acpi_timer_freq: 3579545 machdep.tsc_freq: 2000000000 machdep.i8254_freq: 1193182 dev.acpi_ibm.0.fan_speed: 4465 dev.acpi_ibm.0.fan_level: 0 dev.acpi_ibm.0.fan: 1 dev.cpu.0.freq: 2000 dev.cpu.0.freq_levels: 2000/0 1750/0 1500/0 1250/0 1200/0 1050/0 900/0 750/0 600/0 450/0 300/0 dev.acpi_perf.0.freq_settings: 2000/0 1200/0 dev.cpufreq.0.%driver: cpufreq dev.cpufreq.0.%parent: cpu0 dev.p4tcc.0.freq_settings: 10000/-1 8750/-1 7500/-1 6250/-1 5000/-1 3750/-1 2500/-1 Active cooling doesn't seem to be supported. There is a fan of course, and I can even set a fan level via dev.acpi_ibm.0.fan, but this is not related to hw.acpi.thermal.tz0._HOT and hw.acpi.thermal.tz0._ACx (which is read only anyway). According to dev.acpi_ibm.0.fan_speed the speed of the fan is something between 4450 and 4780. The interesting bit here is cpufreq and how it behaves. Lets have a look at the output of the following loop: # while true ; do temp=$( sysctl hw.acpi.thermal.tz0.temperature ) ; freq=$( sysctl dev.cpu.0.freq ) ; printf "%4s %4s\n" $freq[17,$#freq] $temp[34,$#temp] ; sleep 2 ; done 2000 84.0C 2000 85.0C 2000 85.0C 2000 85.0C 2000 86.0C 300 86.0C 300 86.0C 300 86.0C 300 85.0C 300 84.0C 300 82.0C 300 81.0C It appears that cpufreq requires at least eight seconds to reduce the frequency. There are two issues I'm seeing here: Firstly hw.acpi.thermal.polling_rate: 2 Either I get this one wrong, or cpufreq doesn't react after every poll. I've seen this in the past, but not as good as now. Secondly cpufreq doesn't seem to use dev.cpu.0.freq_levels at all, but drop to the lowest frequency available. And it does the same the other way round, too. I was able to built a new userland and kernel yesterday, so I'll do some more testing with a decent system after a clean reboot. The kernel I want to use next time will be plain GENERIC. This does not turn on support for active cooling in any way, something I was thinking about because according to acpi_ibm fan levels from 0 to 7 are supported. And setting them manually works, so I guess this should be possible with acpi.thermal, too. Or am I mistaken and acpi.thermal and acpi_ibm don't interact with each other? The interesting bit here is cpufreq: Is the behaviour normal and to be expected, or is this possibly a bug? Regards Christian Walther
Roland Smith
2009-Aug-12 18:21 UTC
Cpufreq/ACPI problem? (basically still is: "Re: Problem with IBM Thinkpad T30 shutting down due to high temperatures")
On Wed, Aug 12, 2009 at 09:47:18AM +0200, Christian Walther wrote:> Hi, > > thank you for all your feedback. > I won't answer all replies in detail, but will summarise what I did to > give you some sort of report. > Doug made me think about the beginning of this situation. I can't tell > you for sure that I had the T30 working flawlessly, because I took the > original install from another, older thinkpad. > But I did change some BIOS settings, Interrupt settings, mainly, that > seem to cause problems with my Wireless NIC in the past. So I restored > the BIOS defaults. This seems to make the problem disappear, but to be > honest: I'm not sure if I messed up the ACPI table at all, or if this > is some sort of performance issue, because I now have all IO bound > devices on IRQ 11: > > vgapci0: <VGA-compatible display> port 0x3000-0x30ff mem > 0xe8000000-0xefffffff,0xd0100000-0xd010ffff irq 11 at device 0.0 on > pci1 > uhci0: <Intel 82801CA/CAM (ICH3) USB controller USB-A> port > 0x1800-0x181f irq 11 at device 29.0 on pci0 > uhci1: <Intel 82801CA/CAM (ICH3) USB controller USB-B> port > 0x1820-0x183f irq 11 at device 29.1 on pci0 > uhci2: <Intel 82801CA/CAM (ICH3) USB controller USB-C> port > 0x1840-0x185f irq 11 at device 29.2 on pci0 > cbb0: <TI1520 PCI-CardBus Bridge> mem 0x50000000-0x50000fff irq 11 at > device 0.0 on pci2 > cbb1: <TI1520 PCI-CardBus Bridge> mem 0x51000000-0x51000fff irq 11 at > device 0.1 on pci2 > fxp0: <Intel 82801CAM (ICH3) Pro/100 VE Ethernet> port 0x8000-0x803f > mem 0xd0200000-0xd0200fff irq 11 at device 8.0 on pci2 > pcm0: <Intel ICH3 (82801CA)> port 0x1c00-0x1cff,0x18c0-0x18ff irq 11 > at device 31.5 on pci0 > > This causes screen refresh problems (e.g. urxvt isn't able to draw new > lines as expected). Still, this didn't resolve the issue, so I took a > look at acpi_thermal. > Right now I have the following set in /etc/sysctl.conf > > hw.acpi.thermal.user_override=1According to acpi_thermal(4), you should not use decimal. So it should be 84C instead of 84.0C.> hw.acpi.thermal.tz0._PSV=84.0C > hw.acpi.thermal.polling_rate=2 > > This basically gives me: > > # sysctl -a|egrep "(temp|freq|acpi.therm|acpi_ibm.*fan)" > kern.acct_chkfreq: 15 > kern.timecounter.tc.i8254.frequency: 1193182 > kern.timecounter.tc.ACPI-fast.frequency: 3579545 > kern.timecounter.tc.TSC.frequency: 2000000000 > net.inet.sctp.sack_freq: 2 > net.inet6.ip6.use_tempaddr: 0 > net.inet6.ip6.temppltime: 86400 > net.inet6.ip6.tempvltime: 604800 > net.inet6.ip6.prefer_tempaddr: 0 > debug.cpufreq.verbose: 0> debug.cpufreq.lowest: 0You should look at dev.cpu.N.freq_levels, where N is the number of the core. See cpufreq(4) and below. <snip>> hw.acpi.thermal.polling_rate: 2The polling_rate is just the number of seconds between readings of the temperature. Nothing more.> hw.acpi.thermal.user_override: 1 > hw.acpi.thermal.tz0.temperature: 62.0C > hw.acpi.thermal.tz0.active: 0 > hw.acpi.thermal.tz0.passive_cooling: 1 > hw.acpi.thermal.tz0.thermal_flags: 0> hw.acpi.thermal.tz0._PSV: 84.0CThe _PSV setting means that the system will only start throttling the CPU when temperature reaches 84?C! You might want to set that a little lower. The system shuts down at 92?C. That seems to be a fine line to walk.> hw.acpi.thermal.tz0._HOT: -1 > hw.acpi.thermal.tz0._CRT: 92.0C > hw.acpi.thermal.tz0._ACx: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 > hw.acpi.thermal.tz0._TC1: 5 > hw.acpi.thermal.tz0._TC2: 3 > hw.acpi.thermal.tz0._TSP: 600 > machdep.acpi_timer_freq: 3579545 > machdep.tsc_freq: 2000000000 > machdep.i8254_freq: 1193182 > dev.acpi_ibm.0.fan_speed: 4465 > dev.acpi_ibm.0.fan_level: 0 > dev.acpi_ibm.0.fan: 1 > dev.cpu.0.freq: 2000 > dev.cpu.0.freq_levels: 2000/0 1750/0 1500/0 1250/0 1200/0 1050/0 900/0 > 750/0 600/0 450/0 300/0 > dev.acpi_perf.0.freq_settings: 2000/0 1200/0 > dev.cpufreq.0.%driver: cpufreq > dev.cpufreq.0.%parent: cpu0 > dev.p4tcc.0.freq_settings: 10000/-1 8750/-1 7500/-1 6250/-1 5000/-1 > 3750/-1 2500/-1 > > Active cooling doesn't seem to be supported. There is a fan of course, > and I can even set a fan level via dev.acpi_ibm.0.fan, but this is not > related to hw.acpi.thermal.tz0._HOT and hw.acpi.thermal.tz0._ACx > (which is read only anyway). > According to dev.acpi_ibm.0.fan_speed the speed of the fan is > something between 4450 and 4780. > > The interesting bit here is cpufreq and how it behaves. Lets have a > look at the output of the following loop: > # while true ; do temp=$( sysctl hw.acpi.thermal.tz0.temperature ) ; > freq=$( sysctl dev.cpu.0.freq ) ; printf "%4s %4s\n" $freq[17,$#freq] > $temp[34,$#temp] ; sleep 2 ; done > 2000 84.0C > 2000 85.0C > 2000 85.0C > 2000 85.0C > 2000 86.0C > 300 86.0C > 300 86.0C > 300 86.0C > 300 85.0C > 300 84.0C > 300 82.0C > 300 81.0C > > It appears that cpufreq requires at least eight seconds to reduce the > frequency. There are two issues I'm seeing here: Firstly > hw.acpi.thermal.polling_rate: 2 Either I get this one wrong, or > cpufreq doesn't react after every poll.The latter, I think. <snip>> The interesting bit here is cpufreq: Is the behaviour normal and to be > expected, or is this possibly a bug?Cpufreq is just a frequency control framework. It relies on powerd(8) to actually change frequencies. You can set variables in /etc/rc.conf to enable and control powerd. On my desktop I have the following in /etc/rc.conf: # Enable power monitoring. powerd_enable="YES" powerd_flags="-i 95 -r 90" My laptop works best without powerd_flags set. YMMV. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20090812/79035815/attachment.pgp