pmcstat -T -P instructions -t $pid ==> NMI

Fri Oct 28 13:02:13 UTC 2016

On 27/10/2016 16:20, Andriy Gapon wrote:
> 
> I observe a problem on a relatively recent, but not the latest, head.
> r306752 amd64 on AMD hardware.
> If I run
> 	pmcstat -T -P instructions -t $pid
> with a pid of a busy userland process, then I shortly get a (stray) NMI.
> Apparently hwpmc does not recognize that NMI.

Because the problem was readily reproducible, I managed to gather some hwpmc
debug traces.  The following are last messages collected from a crash dump that
I made after getting an NMI on CPU#0:

index  cpu timestamp        trace
------ --- ---------------- -----
 97230   1    1089795818106 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
 97229   1    1089795792315 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
 97228   1    1089795767010 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
 97227   1    1089795741843 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
 97226   1    1089795716970 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
 97225   1    1089795691881 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
 97224   1    1089795666756 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
 97223   1    1089795641637 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
 97222   1    1089795616422 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
 97221   1    1089795590715 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
 97220   0    1089795590076 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=0
 97219   1    1089795560493 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
 97218   1    1089795520646 MDP:SWI:1: pc=0xfffff8002f355e00
pp=0xfffff80021f56400 enable-msr=0
 97217   1    1089795519677 MDP:STA:2: amd-start config=0x5300c0
 97216   1    1089795519420 MDP:STA:1: amd-start cpu=1 ri=0
 97215   1    1089795518154 MDP:WRI:1: amd-write cpu=1 ri=0 v=ffffffffffff00fe
 97214   1    1089795517311 CSW:SWI:1: cpu=1 ri=17 new=65282
 97213   1    1089795514866 MDP:CFG:1: cpu=1 ri=0 pm=0xfffff8001c6e0500
 97212   1    1089795510620 CSW:SWI:1: cpu=1 proc=0xfffff8013b1fe000 (1666,
burnK7) pp=0xfffff80021f56400
 97211   0    1089795494715 MDP:SWO:1: pc=0xfffff8001c6e0800
pp=0xfffff80021f56400 enable-msr=0
 97210   0    1089795494448 MDP:CFG:1: cpu=0 ri=0 pm=0x0
 97209   0    1089795494003 CSW:SWO:1: cpu=0 ri=17 tmp=-2839 (samp)
 97208   0    1089795493811 MDP:REA:2: amd-read (post-munge) id=0 -> 65282
 97207   0    1089795493644 MDP:REA:2: amd-read (pre-munge) id=0 -> 281474976645374
 97206   0    1089795493299 MDP:REA:1: amd-read id=0 class=2
 97205   0    1089795491453 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=0
 97204   0    1089795490487 MDP:STO:1: amd-stop ri=0
 97203   0    1089795489399 CSW:SWO:1: cpu=0 proc=0xfffff8013b1fe000 (1666,
burnK7) pp=0xfffff80021f56400
 97202   0    1089795460401 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1
 97201   0    1089795433574 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1
 97200   0    1089795371727 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=0
 97199   0    1089795328556 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1
 97198   0    1089795303542 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1
 97197   0    1089795278288 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1
 97196   0    1089795253226 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1

I think that what we see here is the target process being migrated from CPU#0 to
CPU#1 and the corresponding reconfiguration of the performance counters on both
CPUs.

I interpret event #97220 to be the unclaimed NMI.
Events #97204 and #97205 look curious.  They seem like a possible "race" between
amd_stop_pmc() and amd_intr().  As I understand, amd_stop_pmc() is called from
the context switch code when the target process gets off CPU.
I suspect that under the right conditions it's possible for wrmsr to cause a
counter overflow, such that an interrupt (if enabled) is generated after wrmsr
is executed, even if wrmsr disables the counter.

In amd_intr() we have this code:
                wrmsr(evsel, config & ~AMD_PMC_ENABLE);
                wrmsr(perfctr, AMD_RELOAD_COUNT_TO_PERFCTR_VALUE(v));

                /* Restart the counter if logging succeeded. */
                error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,
                    TRAPF_USERMODE(tf));
                if (error == 0)
                        wrmsr(evsel, config | AMD_PMC_ENABLE);

I suspect that in the scenario above, if it is indeed possible, the last wrmsr
would re-enable the counter that's supposed to be stopped.

I think that writing back the original value should be more correct, that is:
			wrmsr(evsel, config);

I'll test if this change would help.

-- 
Andriy Gapon