Leaving the Desktop Market
Allan Jude
freebsd at allanjude.com
Tue May 13 02:09:16 UTC 2014
On 2014-05-12 14:25, Adrian Chadd wrote:
> On 12 May 2014 10:35, Allan Jude <freebsd at allanjude.com> wrote:
>> I have this system:
>>
>> hw.model: Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
>> hw.ncpu: 4
>>
>> http://ark.intel.com/products/75052
>>
>> dev.cpu.0.%desc: ACPI CPU
>> dev.cpu.0.%driver: cpu
>> dev.cpu.0.%location: handle=\_PR_.CPU0
>> dev.cpu.0.%pnpinfo: _HID=none _UID=0
>> dev.cpu.0.%parent: acpi0
>> dev.cpu.0.freq: 3100
>> dev.cpu.0.freq_levels: 3101/80000 3100/80000 2900/72713 2800/69558
>> 2600/62669 2400/56794 2300/53935 2100/47673 1900/42370 1800/39795
>> 1600/34136 1500/31729 1300/26432 1137/23128 1100/21994 1000/19851
>> 875/17369 800/15113 700/13223 600/11334 500/9445 400/7556 300/5667
>> 200/3778 100/1889
>> dev.cpu.0.cx_supported: C1/1/1 C2/2/148
>> dev.cpu.0.cx_lowest: C8
>> dev.cpu.0.cx_usage: 9.01% 90.98% last 807us
>> dev.cpu.1.%desc: ACPI CPU
>> dev.cpu.1.%driver: cpu
>> dev.cpu.1.%location: handle=\_PR_.CPU1
>> dev.cpu.1.%pnpinfo: _HID=none _UID=0
>> dev.cpu.1.%parent: acpi0
>> dev.cpu.1.cx_supported: C1/1/1 C2/2/148
>> dev.cpu.1.cx_lowest: C8
>> dev.cpu.1.cx_usage: 11.70% 88.29% last 21303us
>> dev.cpu.2.%desc: ACPI CPU
>> dev.cpu.2.%driver: cpu
>> dev.cpu.2.%location: handle=\_PR_.CPU2
>> dev.cpu.2.%pnpinfo: _HID=none _UID=0
>> dev.cpu.2.%parent: acpi0
>> dev.cpu.2.cx_supported: C1/1/1 C2/2/148
>> dev.cpu.2.cx_lowest: C8
>> dev.cpu.2.cx_usage: 15.17% 84.82% last 22987us
>> dev.cpu.3.%desc: ACPI CPU
>> dev.cpu.3.%driver: cpu
>> dev.cpu.3.%location: handle=\_PR_.CPU3
>> dev.cpu.3.%pnpinfo: _HID=none _UID=0
>> dev.cpu.3.%parent: acpi0
>> dev.cpu.3.cx_supported: C1/1/1 C2/2/148
>> dev.cpu.3.cx_lowest: C8
>> dev.cpu.3.cx_usage: 11.74% 88.25% last 6073us
>>
> So ACPI is exposing C1 and C2 only.
>
>> According to the Intel specs (Page 11), this processor supports C1, C1E,
>> C3, C6 and C7
>>
>> The above sysctl dump shows only C1 and C2. I wonder if the C2 is
>> actually C3
>>
>> http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e3-1200v3-vol-1-datasheet.pdf
> It'd say C2/3/xxx in that case.
>
> Chances are you'll end up seeing it fall into deeper sleep states. Try
> installing intel-pcm; kldload cpuctl; run pcm.x 1 . See if it's
> entering lower CPU states.
>
>> How is our support for the newer Cx States introduced in Haswell, which
>> can apparently go as high as C10
> I don't know if we get those exposed via ACPI. I know there's a bunch
> of cute things we could be doing with MWAIT that we aren't, but we
> certainly should be drifting into lower sleep states.
>
> Just run intel-pcm and see.
>
> Thanks,
>
>
>
> -a
stock configuration:
# pcm.x 10
Intel(r) Performance Counter Monitor V2.6 (2013-11-04 13:43:31 +0100
ID=db05e43)
Copyright (c) 2009-2013 Intel Corporation
Number of physical cores: 4
Number of logical cores: 4
Threads (logical cores) per physical core: 1
Num sockets: 1
Core PMU (perfmon) version: 3
Number of core PMU generic (programmable) counters: 8
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 3100000000 Hz
Package thermal spec power: 80 Watt; Package minimum power: 0 Watt;
Package maximum power: 0 Watt;
Detected Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz "Intel(r)
microarchitecture codename Haswell"
EXEC : instructions per nominal CPU cycle
IPC : instructions per CPU cycle
FREQ : relation to nominal CPU frequency='unhalted clock
ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
AFREQ : relation to nominal CPU frequency while in active state (not in
power-saving C state)='unhalted clock ticks'/'invariant timer ticks
while in C0-state' (includes Intel Turbo Boost)
L3MISS: L3 cache misses
L2MISS: L2 cache misses (including other core's L2 cache *hits*)
L3HIT : L3 cache hit ratio (0.00-1.00)
L2HIT : L2 cache hit ratio (0.00-1.00)
L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in
some cases could be >1.0 due to a higher memory latency
L2CLK : ratio of CPU cycles lost due to missing L2 cache but still
hitting L3 cache (0.00-1.00)
READ : bytes read from memory controller (in GBytes)
WRITE : bytes written to memory controller (in GBytes)
TEMP : Temperature reading in 1 degree Celsius relative to the TjMax
temperature (thermal headroom): 0 corresponds to the max temperature
Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT |
L2HIT | L3CLK | L2CLK | READ | WRITE | TEMP
0 0 0.00 0.74 0.00 0.76 39 K 120 K 0.67
0.66 0.13 0.09 N/A N/A 74
1 0 0.00 0.72 0.00 0.75 17 K 80 K 0.79
0.71 0.07 0.10 N/A N/A 76
2 0 0.00 0.62 0.00 0.61 8037 33 K 0.76
0.58 0.08 0.08 N/A N/A 76
3 0 0.00 0.72 0.00 0.76 18 K 98 K 0.81
0.70 0.07 0.10 N/A N/A 76
-------------------------------------------------------------------------------------------------------------------
SKT 0 0.00 0.72 0.00 0.74 83 K 332 K 0.75
0.68 0.09 0.09 0.38 0.01 74
-------------------------------------------------------------------------------------------------------------------
TOTAL * 0.00 0.72 0.00 0.74 83 K 332 K 0.75
0.68 0.09 0.09 0.38 0.01 N/A
Instructions retired: 118 M ; Active cycles: 165 M ; Time (TSC): 30
Gticks ; C0 (active,non-halted) core residency: 0.18 %
C1 core residency: 99.82 %; C3 core residency: 0.00 %; C6 core
residency: 0.00 %; C7 core residency: 0.00 %;
C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package
residency: 0.00 %; C7 package residency: 0.00 %;
PHYSICAL CORE IPC : 0.72 => corresponds to 17.93 %
utilization for cores in active state
Instructions per nominal CPU cycle: 0.00 => corresponds to 0.02 % core
utilization over time interval
----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
SKT 0 package consumed 68.42 Joules
----------------------------------------------------------------------------------------------
TOTAL: 68.42 Joules
Then just enabling the higher Cx states (no powerd or anything):
# sysctl hw.acpi.cpu.cx_lowest=c8
hw.acpi.cpu.cx_lowest: C1 -> C8
# pcm.x 10
Intel(r) Performance Counter Monitor V2.6 (2013-11-04 13:43:31 +0100
ID=db05e43)
Copyright (c) 2009-2013 Intel Corporation
Number of physical cores: 4
Number of logical cores: 4
Threads (logical cores) per physical core: 1
Num sockets: 1
Core PMU (perfmon) version: 3
Number of core PMU generic (programmable) counters: 8
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 3100000000 Hz
Package thermal spec power: 80 Watt; Package minimum power: 0 Watt;
Package maximum power: 0 Watt;
Detected Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz "Intel(r)
microarchitecture codename Haswell"
EXEC : instructions per nominal CPU cycle
IPC : instructions per CPU cycle
FREQ : relation to nominal CPU frequency='unhalted clock
ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
AFREQ : relation to nominal CPU frequency while in active state (not in
power-saving C state)='unhalted clock ticks'/'invariant timer ticks
while in C0-state' (includes Intel Turbo Boost)
L3MISS: L3 cache misses
L2MISS: L2 cache misses (including other core's L2 cache *hits*)
L3HIT : L3 cache hit ratio (0.00-1.00)
L2HIT : L2 cache hit ratio (0.00-1.00)
L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in
some cases could be >1.0 due to a higher memory latency
L2CLK : ratio of CPU cycles lost due to missing L2 cache but still
hitting L3 cache (0.00-1.00)
READ : bytes read from memory controller (in GBytes)
WRITE : bytes written to memory controller (in GBytes)
TEMP : Temperature reading in 1 degree Celsius relative to the TjMax
temperature (thermal headroom): 0 corresponds to the max temperature
Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT |
L2HIT | L3CLK | L2CLK | READ | WRITE | TEMP
0 0 0.00 0.11 0.00 0.99 611 K 629 K 0.03
0.02 1.89 0.01 N/A N/A 73
1 0 0.00 0.19 0.00 0.99 152 K 169 K 0.10
0.04 1.36 0.04 N/A N/A 76
2 0 0.00 0.20 0.00 0.99 153 K 171 K 0.10
0.04 1.29 0.04 N/A N/A 77
3 0 0.00 0.20 0.00 1.00 159 K 180 K 0.12
0.04 1.14 0.03 N/A N/A 76
-------------------------------------------------------------------------------------------------------------------
SKT 0 0.00 0.16 0.00 0.99 1077 K 1150 K 0.06
0.03 1.55 0.03 0.14 0.01 72
-------------------------------------------------------------------------------------------------------------------
TOTAL * 0.00 0.16 0.00 0.99 1077 K 1150 K 0.06
0.03 1.55 0.03 0.14 0.01 N/A
Instructions retired: 19 M ; Active cycles: 125 M ; Time (TSC): 31
Gticks ; C0 (active,non-halted) core residency: 0.10 %
C1 core residency: 1.85 %; C3 core residency: 0.00 %; C6 core
residency: 0.00 %; C7 core residency: 98.05 %;
C2 package residency: 8.10 %; C3 package residency: 7.46 %; C6 package
residency: 79.20 %; C7 package residency: 0.00 %;
PHYSICAL CORE IPC : 0.16 => corresponds to 3.96 %
utilization for cores in active state
Instructions per nominal CPU cycle: 0.00 => corresponds to 0.00 % core
utilization over time interval
----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
SKT 0 package consumed 22.77 Joules
----------------------------------------------------------------------------------------------
TOTAL: 22.77 Joules
Suggest it is spending lots of time in C6 and C7
Will try to grab results from a few more machines
More information about the freebsd-current
mailing list