HWPMC trouble
Slawa Olhovchenkov
slw at zxy.spb.ru
Sun Feb 2 12:58:33 UTC 2020
I am try use hwpmc on this hardware:
CPU: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz (3066.83-MHz K8-class CPU)
hwpmc: SOFT/16/64/0x67<INT,USR,SYS,REA,WRI> TSC/1/64/0x20<REA> IAP/4/48/0x3ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA,PRC> IAF/3/48/0x67<INT,USR,SYS,REA,WRI> UCP/8/48/0x3f8<EDG,THR,REA,WRI,INV,QUA,PRC> UCF/1/48/0x60<REA,WRI>
I am have some troubles:
1. I am try to discover hotspots in userland code run on 4th CPU:
# pmcstat -c 4 -l 4 -S cpu_clk_unhalted.thread -O sample.out
# ls -l sample.out
-rw-r--r-- 1 root wheel 70480 Feb 2 15:46 sample.out
Too small!
# pmcstat -R sample.out -z100 -G out.txt
CONVERSION STATISTICS:
#samples/total 564
#callchain/dubious-frames 551
# head -30 out.txt
@ cpu_clk_unhalted.thread [13 samples]
23.08% [3] em_update_stats_counters @ /boot/kernel/kernel
100.0% [3] em_if_update_admin_status
100.0% [3] _task_fn_admin
100.0% [3] gtaskqueue_run_locked
100.0% [3] gtaskqueue_thread_loop
100.0% [3] fork_exit
15.38% [2] spinlock_exit @ /boot/kernel/kernel
50.00% [1] spinlock_exit
100.0% [1] wakeup_any
100.0% [1] grouptaskqueue_enqueue
100.0% [1] iflib_timer
100.0% [1] softclock_call_cc
100.0% [1] softclock
100.0% [1] ithread_loop
100.0% [1] fork_exit
50.00% [1] cpuctl_do_msr @ /boot/kernel/cpuctl.ko
100.0% [1] devfs_ioctl @ /boot/kernel/kernel
100.0% [1] VOP_IOCTL_APV
100.0% [1] vn_ioctl
100.0% [1] devfs_ioctl_f
100.0% [1] kern_ioctl
100.0% [1] sys_ioctl
100.0% [1] amd64_syscall
I am don't see any actual code path.
In reality `top` show:
CPU 4: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
Where all samples from userland?
2. I am try to discover mostly memory touch codeptah:
# pmcstat -l 4 -S mem_load_retired.llc_miss -O sample.out
# ls -l sample.out
-rw-r--r-- 1 root wheel 235896 Feb 2 15:51 sample.out
# pmcstat -R sample.out
initlog 0x9030000 "INTEL_WESTMERE"
allocate 0x711 "mem_load_retired.llc_miss" 0x20080
allocate 0x100711 "mem_load_retired.llc_miss" 0x20080
allocate 0x200711 "mem_load_retired.llc_miss" 0x20080
allocate 0x300711 "mem_load_retired.llc_miss" 0x20080
allocate 0x400711 "mem_load_retired.llc_miss" 0x20080
allocate 0x500711 "mem_load_retired.llc_miss" 0x20080
allocate 0x600711 "mem_load_retired.llc_miss" 0x20080
allocate 0x700711 "mem_load_retired.llc_miss" 0x20080
allocate 0x800711 "mem_load_retired.llc_miss" 0x20080
allocate 0x900711 "mem_load_retired.llc_miss" 0x20080
allocate 0xa00711 "mem_load_retired.llc_miss" 0x20080
allocate 0xb00711 "mem_load_retired.llc_miss" 0x20080
unknown event (type 20).
unknown event (type 18).
unknown event (type 20).
unknown event (type 18).
unknown event (type 20).
....
unknown event (type 18).
map-in -1 0xffffffff80200000 "kernel"
map-in -1 0xffffffff82471000 "xhci.ko"
...
# pmcstat -R sample.out -z100 -G out.txt
CONVERSION STATISTICS:
#samples/total 27
# less out.txt
@ mem_load_retired.llc_miss [27 samples]
81.48% [22] native_lapic_ipi_raw @ /boot/kernel/kernel
100.0% [22] sched_add
100.0% [22] setrunnable
100.0% [22] sleepq_signal
100.0% [22] wakeup_any
100.0% [22] grouptaskqueue_enqueue
100.0% [22] iflib_fast_intr
100.0% [22] intr_event_handle
100.0% [22] intr_execute_handlers
100.0% [22] lapic_handle_intr
14.81% [4] lapic_handle_intr @ /boot/kernel/kernel
03.70% [1] sched_clock @ /boot/kernel/kernel
100.0% [1] statclock
100.0% [1] handleevents
100.0% [1] timercb
100.0% [1] lapic_handle_timer
Where all userland? `pcm.x` output:
EXEC : instructions per nominal CPU cycle
IPC : instructions per CPU cycle
FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost)
L3MISS: L3 (read) cache misses
L2MISS: L2 (read) cache misses (including other core's L2 cache *hits*)
L3HIT : L3 (read) cache hit ratio (0.00-1.00)
L2HIT : L2 cache hit ratio (0.00-1.00)
L3MPI : number of L3 (read) cache misses per instruction
L2MPI : number of L2 (read) cache misses per instruction
READ : bytes read from main memory controller (in GBytes)
WRITE : bytes written to main memory controller (in GBytes)
TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature
energy: Energy in Joules
Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI | TEMP
0 0 0.70 0.64 1.09 1.09 14 M 20 M 0.31 0.26 0.01 0.01 24
1 0 0.89 0.81 1.09 1.09 3084 K 17 M 0.83 0.31 0.00 0.01 24
2 0 0.97 0.89 1.09 1.09 17 M 25 M 0.30 0.39 0.01 0.01 28
3 0 0.01 0.20 0.04 1.09 37 K 327 K 0.89 0.73 0.00 0.01 28
4 0 0.68 0.63 1.09 1.09 7268 K 9959 K 0.27 0.13 0.00 0.00 28
5 0 0.68 0.63 1.09 1.09 7275 K 9789 K 0.26 0.12 0.00 0.00 28
6 0 0.68 0.63 1.09 1.09 7271 K 10 M 0.28 0.11 0.00 0.00 24
7 0 0.68 0.62 1.09 1.09 7269 K 9761 K 0.26 0.11 0.00 0.00 24
8 0 0.65 0.60 1.09 1.09 7265 K 10 M 0.29 0.18 0.00 0.01 23
9 0 0.68 0.62 1.09 1.09 7270 K 10 M 0.28 0.17 0.00 0.00 23
10 0 0.65 0.60 1.09 1.09 7272 K 10 M 0.28 0.18 0.00 0.01 26
11 0 0.67 0.62 1.09 1.09 7283 K 9946 K 0.27 0.18 0.00 0.00 26
---------------------------------------------------------------------------------------------------------------
SKT 0 0.66 0.66 1.00 1.09 93 M 143 M 0.35 0.24 0.00 0.01 N/A
---------------------------------------------------------------------------------------------------------------
TOTAL * 0.66 0.66 1.00 1.09 93 M 143 M 0.35 0.24 0.00 0.01 N/A
Instructions retired: 24 G ; Active cycles: 36 G ; Time (TSC): 3069 Mticks ; C0 (active,non-halted) core residency: 92.01 %
C1 core residency: 7.99 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %;
C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %;
PHYSICAL CORE IPC : 1.32 => corresponds to 33.07 % utilization for cores in active state
Instructions per nominal CPU cycle: 1.32 => corresponds to 33.07 % core utilization over time interval
SMI count: 0
---------------------------------------------------------------------------------------------------------------
MEM (GB)->| READ | WRITE |
---------------------------------------------------------------------------------------------------------------
SKT 0 7.36 5.04
---------------------------------------------------------------------------------------------------------------
I.e I am got 7.2M LLC miss in reality and don't collect any.
What I am miss?
More information about the freebsd-hackers
mailing list