Wrong temperature with AMD and amdtemp.ko
Don Lewis
truckman at FreeBSD.org
Tue Oct 6 04:28:48 UTC 2015
On 3 Oct, Willem Jan Withagen wrote:
> On 2-10-2015 23:32, Don Lewis wrote:
>> On 2 Oct, Willem Jan Withagen wrote:
>>>
>>> Hi
>>>
>>> 10.2-STABLE FreeBSD 10.2-STABLE #0 r287102: Mon Aug 24
>>>
>>> Processor: Opteron 6812, in Supermicro H8SGL
>>>
>>> dev.cpu.7.temperature: 11.1C
>>> dev.cpu.6.temperature: 11.1C
>>> dev.cpu.5.temperature: 11.1C
>>> dev.cpu.4.temperature: 11.1C
>>> dev.cpu.3.temperature: 11.1C
>>> dev.cpu.2.temperature: 11.1C
>>> dev.cpu.1.temperature: 11.1C
>>> dev.cpu.0.temperature: 11.1C
>>>
>>> But I'm pretty sure it is not 11.1C in the datacenter....
>>>
>>> Or should I not use amdtemp.ko for this?
>>
>> The definition of the value that can be read from the temperature
>> register is pretty strange. For AMD Family 15h processors, the BIOS and
>> Kernel Developer's Guide (BKDG) says this:
>>
>> Tctl is a processor temperature control value used for processor
>> thermal management. Tctl is accessible through D18F3xA4[CurTmp].
>> Tctl is a temperature on its own scale aligned to the processors
>> cooling requirements. Therefore Tctl does not represent a temperature
>> which could be measured on the die or the case of the processor.
>> Instead, it specifies the processor temperature relative to the
>> maximum operating temperature, Tctl,max. Tctl,max is specified in the
>> power and thermal data sheet. Tctl is defined as follows for all
>> parts:
>>
>> A: For Tctl = Tctl_max to 255.875: the temperature of the part is
>> [Tctl - Tctl_max] over the maximum operat- ing temperature. The
>> processor may take corrective actions that affects performance, such
>> as HTC, to support the return to Tctl range A.
>>
>> B: For Tctl = 0 to Tctl_max - 0.125: the temperature of the part is
>> [Tctl_max - Tctl] under the maximum operating temperature.
>>
>> It would be nice to report Tctl_max so that we could at least know how
>> far the temperature is from the limit, but I don't know if that is
>> available. It might be the value in the HtcTmpLmt register, but the
>> BKDG is unclear about that. If not, we would have to build a table of
>> values from the datasheet.
>
> And
>
> On 2-10-2015 23:06, Jung-uk Kim wrote:
>> On 10/02/2015 16:49, Willem Jan Withagen wrote:
>
>> amdtemp(4):
>>
>> For Family 10h and later processors, “(the reported temperature) is a
>> non-physical temperature measured on an arbitrary scale and it does not
>> represent an actual physical temperature like die or case temperature.
>> Instead, it specifies the processor temperature relative to the point at
>> which the system must supply the maximum cooling for the processor's
>> specified maximum case temperature and maximum thermal power dissipation”
>> according to BIOS and Kernel Developer's Guide (BKDG) for AMD Processors,
>> http://developer.amd.com/documentation/guides/Pages/default.aspx.
>
> If one boots into the BIOS, the BIOS suggests that it knows how to do
> this conversion.... Perhaps one can question the ultimate correctness of
> the outcome, but the 51.3C value suggests some accuracy.
That may be a measurement from a separate temperature sensor on the
motherboard underneath the CPU socket.
> Thusfar I have not been able to locate the "Power and Thermal Datasheet"
> for the family 15h....
> Perhaps need to disassemble the bios, or check other tools or OSes on
> how they do this.
>
> --WjW
>
More information about the freebsd-current
mailing list