HWPMC trouble
    Slawa Olhovchenkov 
    slw at zxy.spb.ru
       
    Sun Feb  2 12:58:33 UTC 2020
    
    
  
I am try use hwpmc on this hardware:
CPU: Intel(R) Xeon(R) CPU           X5675  @ 3.07GHz (3066.83-MHz K8-class CPU)
hwpmc: SOFT/16/64/0x67<INT,USR,SYS,REA,WRI> TSC/1/64/0x20<REA> IAP/4/48/0x3ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA,PRC> IAF/3/48/0x67<INT,USR,SYS,REA,WRI> UCP/8/48/0x3f8<EDG,THR,REA,WRI,INV,QUA,PRC> UCF/1/48/0x60<REA,WRI>
I am have some troubles:
1. I am try to discover hotspots in userland code run on 4th CPU:
# pmcstat -c 4 -l 4 -S cpu_clk_unhalted.thread -O sample.out
# ls -l sample.out
-rw-r--r--  1 root  wheel  70480 Feb  2 15:46 sample.out
Too small!
# pmcstat -R sample.out -z100 -G out.txt
CONVERSION STATISTICS:
 #samples/total                           564
 #callchain/dubious-frames                551
# head -30 out.txt 
@ cpu_clk_unhalted.thread [13 samples]
23.08%  [3]        em_update_stats_counters @ /boot/kernel/kernel
 100.0%  [3]         em_if_update_admin_status
  100.0%  [3]          _task_fn_admin
   100.0%  [3]           gtaskqueue_run_locked
    100.0%  [3]            gtaskqueue_thread_loop
     100.0%  [3]             fork_exit
15.38%  [2]        spinlock_exit @ /boot/kernel/kernel
 50.00%  [1]         spinlock_exit
  100.0%  [1]          wakeup_any
   100.0%  [1]           grouptaskqueue_enqueue
    100.0%  [1]            iflib_timer
     100.0%  [1]             softclock_call_cc
      100.0%  [1]              softclock
       100.0%  [1]               ithread_loop
        100.0%  [1]                fork_exit
 50.00%  [1]         cpuctl_do_msr @ /boot/kernel/cpuctl.ko
  100.0%  [1]          devfs_ioctl @ /boot/kernel/kernel
   100.0%  [1]           VOP_IOCTL_APV
    100.0%  [1]            vn_ioctl
     100.0%  [1]             devfs_ioctl_f
      100.0%  [1]              kern_ioctl
       100.0%  [1]               sys_ioctl
        100.0%  [1]                amd64_syscall
I am don't see any actual code path.
In reality `top` show:
CPU 4:   100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
Where all samples from userland?
2. I am try to discover mostly memory touch codeptah:
# pmcstat -l 4 -S mem_load_retired.llc_miss -O sample.out
# ls -l sample.out
-rw-r--r--  1 root  wheel  235896 Feb  2 15:51 sample.out
# pmcstat -R sample.out
initlog   0x9030000 "INTEL_WESTMERE"
allocate  0x711 "mem_load_retired.llc_miss" 0x20080
allocate  0x100711 "mem_load_retired.llc_miss" 0x20080
allocate  0x200711 "mem_load_retired.llc_miss" 0x20080
allocate  0x300711 "mem_load_retired.llc_miss" 0x20080
allocate  0x400711 "mem_load_retired.llc_miss" 0x20080
allocate  0x500711 "mem_load_retired.llc_miss" 0x20080
allocate  0x600711 "mem_load_retired.llc_miss" 0x20080
allocate  0x700711 "mem_load_retired.llc_miss" 0x20080
allocate  0x800711 "mem_load_retired.llc_miss" 0x20080
allocate  0x900711 "mem_load_retired.llc_miss" 0x20080
allocate  0xa00711 "mem_load_retired.llc_miss" 0x20080
allocate  0xb00711 "mem_load_retired.llc_miss" 0x20080
unknown event (type 20).
unknown event (type 18).
unknown event (type 20).
unknown event (type 18).
unknown event (type 20).
....
unknown event (type 18).
map-in    -1 0xffffffff80200000 "kernel"
map-in    -1 0xffffffff82471000 "xhci.ko"
...
# pmcstat -R sample.out -z100 -G out.txt
CONVERSION STATISTICS:
 #samples/total                           27
# less out.txt 
@ mem_load_retired.llc_miss [27 samples]
81.48%  [22]       native_lapic_ipi_raw @ /boot/kernel/kernel
 100.0%  [22]        sched_add
  100.0%  [22]         setrunnable
   100.0%  [22]          sleepq_signal
    100.0%  [22]           wakeup_any
     100.0%  [22]            grouptaskqueue_enqueue
      100.0%  [22]             iflib_fast_intr
       100.0%  [22]              intr_event_handle
        100.0%  [22]               intr_execute_handlers
         100.0%  [22]                lapic_handle_intr
14.81%  [4]        lapic_handle_intr @ /boot/kernel/kernel
03.70%  [1]        sched_clock @ /boot/kernel/kernel
 100.0%  [1]         statclock
  100.0%  [1]          handleevents
   100.0%  [1]           timercb
    100.0%  [1]            lapic_handle_timer
Where all userland? `pcm.x` output:
 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 (read) cache misses 
 L2MISS: L2 (read) cache misses (including other core's L2 cache *hits*) 
 L3HIT : L3 (read) cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3MPI : number of L3 (read) cache misses per instruction
 L2MPI : number of L2 (read) cache misses per instruction
 READ  : bytes read from main memory controller (in GBytes)
 WRITE : bytes written to main memory controller (in GBytes)
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature
 energy: Energy in Joules
 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI |  TEMP
   0    0     0.70   0.64   1.09    1.09      14 M     20 M    0.31    0.26    0.01    0.01     24
   1    0     0.89   0.81   1.09    1.09    3084 K     17 M    0.83    0.31    0.00    0.01     24
   2    0     0.97   0.89   1.09    1.09      17 M     25 M    0.30    0.39    0.01    0.01     28
   3    0     0.01   0.20   0.04    1.09      37 K    327 K    0.89    0.73    0.00    0.01     28
   4    0     0.68   0.63   1.09    1.09    7268 K   9959 K    0.27    0.13    0.00    0.00     28
   5    0     0.68   0.63   1.09    1.09    7275 K   9789 K    0.26    0.12    0.00    0.00     28
   6    0     0.68   0.63   1.09    1.09    7271 K     10 M    0.28    0.11    0.00    0.00     24
   7    0     0.68   0.62   1.09    1.09    7269 K   9761 K    0.26    0.11    0.00    0.00     24
   8    0     0.65   0.60   1.09    1.09    7265 K     10 M    0.29    0.18    0.00    0.01     23
   9    0     0.68   0.62   1.09    1.09    7270 K     10 M    0.28    0.17    0.00    0.00     23
  10    0     0.65   0.60   1.09    1.09    7272 K     10 M    0.28    0.18    0.00    0.01     26
  11    0     0.67   0.62   1.09    1.09    7283 K   9946 K    0.27    0.18    0.00    0.00     26
---------------------------------------------------------------------------------------------------------------
 SKT    0     0.66   0.66   1.00    1.09      93 M    143 M    0.35    0.24    0.00    0.01     N/A
---------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.66   0.66   1.00    1.09      93 M    143 M    0.35    0.24    0.00    0.01     N/A
 Instructions retired:   24 G ; Active cycles:   36 G ; Time (TSC): 3069 Mticks ; C0 (active,non-halted) core residency: 92.01 %
 C1 core residency: 7.99 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %;
 C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %;
 PHYSICAL CORE IPC                 : 1.32 => corresponds to 33.07 % utilization for cores in active state
 Instructions per nominal CPU cycle: 1.32 => corresponds to 33.07 % core utilization over time interval
 SMI count: 0
---------------------------------------------------------------------------------------------------------------
MEM (GB)->|  READ |  WRITE |
---------------------------------------------------------------------------------------------------------------
 SKT   0     7.36     5.04                   
---------------------------------------------------------------------------------------------------------------
I.e I am got 7.2M LLC miss in reality and don't collect any.
What I am miss?
    
    
More information about the freebsd-hackers
mailing list