Various panics while using HWPMC on ARM64
Wojciech Macek
wma at semihalf.com
Tue Jan 26 09:45:52 UTC 2016
I started to look intu this and there is a race condition during pmcstat
thread migration - both CPUX and CPU0 (the one that the pmcstat is migrated
on) are touching the same structures at the same time. I don't have any fix
for that yet, but one might use a simple workaround with cpuset to force
pmcstat to execute on CPU0.
Wojtek
root at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^Croot at thunder_crb4:~ # cpuset -c -l 0 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
Setting pmcstat to any other CPU causes an immediate crash when exiting the
process.
^Croot at thunder_crb4:~ # cpuset -c -l 47 pmcstat -N -S CPU_CYCLES -O
cpu_cycles.pmc
^C x0: 0
x1: ffffffc00a3c99e0
x2: ffffff800065c00d
x3: ffffff80007fabf0
x4: ffffff876b7e1cd0
x5: ffffff876b7e1480
x6: 40761000
x7: 2fe
x8: 10000284
x9: ffffff800074e550
x10: 7ff96938
x11: 2710
x12: 2af8
x13: 7ff93eec
x14: b
x15: 2a4c
x16: 7ff93fc1
x17: b
x18: ffffff876b7e14f0
x19: ffffff850dc92d00
x20: ffffff850dc92d38
x21: ffffffc004103a00
x22: 0
x23: 1
x24: 0
x25: ffffff80005eb790
x26: ffffff876b7e14f0
x27: 1
x28: ffffff850dc92d00
x29: ffffffc004116848
x30: ffffffc004116848
sp: ffffff876b7e14f0
lr: aa59bd6cae
elr: aa59bd6cae
spsr: 20000085
panic: Unknown kernel exception 22 esr_el1 8a000000
cpuid = 0
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
pc = 0xffffff80005c43dc lr = 0xffffff8000074798
sp = 0xffffff876b7e0ff0 fp = 0xffffff876b7e1200
db_trace_self_wrapper() at vpanic+0x144
pc = 0xffffff8000074798 lr = 0xffffff80002d0f68
sp = 0xffffff876b7e1210 fp = 0xffffff876b7e1280
vpanic() at panic+0x4c
pc = 0xffffff80002d0f68 lr = 0xffffff80002d0e20
sp = 0xffffff876b7e1290 fp = 0xffffff876b7e1310
panic() at do_el1h_sync+0x128
pc = 0xffffff80002d0e20 lr = 0xffffff80005d762c
sp = 0xffffff876b7e1320 fp = 0xffffff876b7e1340
do_el1h_sync() at handle_el1h_sync+0x68
pc = 0xffffff80005d762c lr = 0xffffff80005c5868
sp = 0xffffff876b7e1350 fp = 0xffffff876b7e1460
handle_el1h_sync() at 0xaa59bd6caa
pc = 0xffffff80005c5868 lr = 0x000000aa59bd6caa
sp = 0xffffff876b7e1470 fp = 0xffffffc004116848
KDB: enter: panic
[ thread pid 766 tid 100065 ]
Stopped at kdb_enter+0x40:
db>
2015-12-23 17:53 GMT+01:00 Zbigniew Bodek <zbb at semihalf.com>:
> Hi,
>
> I go into new panics although quite similar to the previously
> mentioned in both SMP and UP but this time I tried to use DTrace.
> Can anyone check this on another ARM64 platform? I used ThunderX and
> FreeBSD-CURRENT r292654.
>
> Best regards
> zbb
>
> Please check out below for panic outputs:
>
> SMP:
> --------
> root at thunder_crb4:~ # procsystime
> Tracing... Hit Ctrl-C to end...
> ^C
>
> Elapsed Times for all processes,
>
> x0: 1
> x1: 0
> x2: ffffff800052c911
> x3: deadc0d8
> x4: 1a2
> x5: ffffff854a976860
> x6: 0
> x7: 100
> x8: deadc0de
> x9: 0
> x10: c4d4ae68
> x11: 6c4d4ae68
> x12: 80000000
> x13: 0
> x14: b
> x15: 274f
> x16: ffffff80481a97f8
> x17: ffffff8048196814
> x18: ffffff854a9764b0
> x19: 989680
> x20: ffffff8000708000
> x21: ffffff8000702038
> x22: ffffffc004219000
> x23: ffffff8000702050
> x24: 3938700
> x25: 3938700
> x26: ffffff80006e2000
> x27: ffffffc0041129a0
> x28: 39386ff
> x29: ffffff854a976530
> x30: ffffff854a976530
> sp: ffffff854a9764b0
> lr: ffffff8000244760
> elr: ffffff80002448e8
> spsr: 600003c5
> far: deadc174
> esr: 96000007
> panic: data abort in critical section or under mutex
> cpuid = 0
> KDB: enter: panic
> [ thread pid 11 tid 100004 ]
> Stopped at kdb_enter+0x40:
> db>
>
> UP:
> --------
> root at thunder_crb4:~ # kldload dtraceall
> IMPLEMENT ME: dtrace_toxic_ranges
> x0: ffffff8000299740
> x1: ffffffc0040cd100
> x2: ffffff8000299740
> x3: ffffff8043655a70
> x4: ffffff8000299740
> x5: 0
> x6: ffffff87cba9c0a8
> x7: 0
> x8: ffffffc0129484d0
> x9: 1
> x10: 1
> x11: 0
> x12: ffffffc003e7ed88
> x13: ffffffc007ab0280
> x14: 400000
> x15: ffffff800068c000
> x16: ffffff80436687b8
> x17: ffffff8000299690
> x18: ffffff87cba9c010
> x19: ffffff8043655a70
> x20: ffffff8000299740
> x21: ffffff8000299740
> x22: ffffffc0040cd100
> x23: ffffffc003e7ed40
> x24: 8
> x25: 1
> x26: ffffff8043669000
> x27: ffffff8043669000
> x28: ffffff8043669000
> x29: ffffff87cba9c030
> x30: ffffff87cba9c030
> sp: ffffff87cba9c010
> lr: ffffff80002996c0
> elr: ffffffc0040cd100
> spsr: 800003c5
> panic: Unknown kernel exception 0 esr_el1 2000000
>
> KDB: enter: panic
> [ thread pid 719 tid 100062 ]
> Stopped at kdb_enter+0x40:
> db>
>
> 2015-12-14 16:24 GMT+01:00 Zbigniew Bodek <zbb at semihalf.com>:
> > Hello,
> >
> > Did you have time to look into that? Do you have any clues what could
> > be wrong here?
> > We would like to use hwpmc for profiling so your help will be very
> > much appreciated.
> >
> > Best regards
> > zbb
> >
> > 2015-12-09 13:06 GMT+01:00 Zbigniew Bodek <zbb at semihalf.com>:
> >> Hello Ed,
> >>
> >> Done. I also check what happens when SMP is disabled and the kassert
> >> is triggered:
> >>
> >> root at thunderx_crb4:~ # pmcstat -S CPU_CYCLES -O cpu_cycles.pmc
> >> ^Cpanic: [pmc,4256] cpu 0 didn't find a sample to collect
> >> KDB: stack backtrace:
> >> db_trace_self() at db_trace_self_wrapper+0x28
> >> pc = 0xffffff80004e9aac lr = 0xffffff800006d8b4
> >> sp = 0xffffff87cba976e0 fp = 0xffffff87cba97800
> >>
> >> db_trace_self_wrapper() at vpanic+0x9c
> >> pc = 0xffffff800006d8b4 lr = 0xffffff800027136c
> >> sp = 0xffffff87cba97810 fp = 0xffffff87cba97880
> >>
> >> vpanic() at kassert_panic+0x160
> >> pc = 0xffffff800027136c lr = 0xffffff80002712cc
> >> sp = 0xffffff87cba97890 fp = 0xffffff87cba97950
> >>
> >> kassert_panic() at pmc_capture_user_callchain+0x1a4
> >> pc = 0xffffff80002712cc lr = 0xffffff80000e1444
> >> sp = 0xffffff87cba97960 fp = 0xffffff87cba979c0
> >>
> >> pmc_capture_user_callchain() at pmc_hook_handler+0x7c0
> >> pc = 0xffffff80000e1444 lr = 0xffffff80000dfb78
> >> sp = 0xffffff87cba979d0 fp = 0xffffff87cba97a50
> >>
> >> pmc_hook_handler() at ast+0x14c
> >> pc = 0xffffff80000dfb78 lr = 0xffffff80002b976c
> >> sp = 0xffffff87cba97a60 fp = 0xffffff87cba97a90
> >>
> >> ast() at handle_el0_sync+0x90
> >> pc = 0xffffff80002b976c lr = 0xffffff80004eb224
> >> sp = 0xffffff87cba97aa0 fp = 0xffffff87cba97bb0
> >>
> >> handle_el0_sync() at 0x406d60
> >> pc = 0xffffff80004eb224 lr = 0x0000000000406d60
> >> sp = 0xffffff87cba97bc0 fp = 0x0000007ffffff540
> >>
> >> KDB: enter: panic
> >> [ thread pid 679 tid 100061 ]
> >> Stopped at kdb_enter+0x40:
> >> db>
> >>
> >>
> >> when invariants options is disabled I only get:
> >>
> >> root at thunderx_crb4:~ # pmcstat -S CPU_CYCLES -O cpu_cycles.pmc
> >> ^Cpmcstat: WARNING: sampling was paused at least 1 time.
> >> Please consider tuning the "kern.hwpmc.nsamples" tunable.
> >>
> >>
> >> Best regards
> >> zbb
> >>
> >> 2015-12-08 20:59 GMT+01:00 Ed Maste <emaste at freebsd.org>:
> >>> On 8 December 2015 at 14:34, Zbigniew Bodek <zbb at semihalf.com> wrote:
> >>>> Hello,
> >>>>
> >>>> I encountered some problems with FreeBSD on ARM64 while using hwpmc.
> >>>> Some of the errors that I found are listed below:
> >>>>
> >>>> * panic: Unknown kernel exception 0 esr_el1 2000000
> >>>> * panic: data abort in critical section or under mutex
> >>>> * panic: VFP exception in the kernel
> >>>> * panic: Unknown kernel exception 21 esr_el1 86000006
> >>>
> >>> Can you add these notes to PR 204686? I think there are SMP issues in
> >>> arm64 hwpmc that need to be resolved.
> _______________________________________________
> freebsd-arm at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arm
> To unsubscribe, send any mail to "freebsd-arm-unsubscribe at freebsd.org"
>
More information about the freebsd-arm
mailing list