Strange crash on wandboard
Ian Lepore
ian at FreeBSD.org
Wed Aug 7 03:57:22 UTC 2013
Okay, this is so strange I've just got to share it... I've been having
trouble with wandboard (solo) bringup and have tracked the problem down
to returning from the first interrupt that happens. (It's a clock
interrupt, but I don't think that's really germane.)
It's as if PULLFRAMEFROMSVCANDEXIT wasn't restoring the registers
correctly. At first the corruption hit the PC, which is damn hard to
debug. But after figuring out just where it was happening in the code
(spinlock_exit()) and inserting some extra debugging printfs, things
changed a bit and now a different register is getting blasted.
Here's what I get at runtime:
clock intr exit
returned: intr_event_handle
vm_fault(0xc0cca000, e46ab000, 1, 0) -> 1
Fatal kernel mode data abort: 'Translation Fault (S)'
trapframe: 0xdd3ffe24
FSR=00000005, FAR=e46abdc0, spsr=600de613
r0 =600001d3, r1 =60000113, r2 =000000c0, r3 =e46abdc0
r4 =c271f620, r5 =c271cbf0, r6 =00000000, r7 =dd3ffea8
r8 =c08d08f4, r9 =00000000, r10=00000000, r11=dd3ffe80
r12=dd3ffe70, ssp=dd3ffe70, slr=c0af2bb4, pc =c0af2be8
[ thread pid 12 tid 100006 ]
Stopped at spinlock_exit+0x5c: ldr r1, [r3]
db>
Here's the asm code around the fault point:
c0af2bd4: e10f0000 mrs r0, CPSR
c0af2bd8: e1c01002 bic r1, r0, r2
c0af2bdc: e0211003 eor r1, r1, r3
c0af2be0: e121f001 msr CPSR_c, r1
c0af2be4: e59f3024 ldr r3, [pc, #36] ; c0af2c10
c0af2be8: e5931000 ldr r1, [r3]
c0af2bec: e3510000 cmp r1, #0 ; 0x0
....
c0af2c10: c0bd6ae4 adcgts r6, sp, r4, ror #21
c0af2c14: c0b4e0e8 adcgts lr, r4, r8, ror #1
Okay, so the msr instruction re-enables interrupts, and the next one
loads r3 with constant value 0xc0bd6ae4, then an interrupt happens
(other instrumentation in PULLFRAMEFROMSVCANDEXIT on previous runs shows
that this is the case every time, 100% reproducible, but that
instrumentation destroys registers it shouldn't, so it's not present in
the run shown above). So the interrupt happens then control returns to
the instruction at c0af2be8, which faults.
Now here's the strange part. Look at the fault-time r3 contents. It's
the byte-reverse of the value it should have. It's been restored
wrong-endian. Just one register from the whole set restored with a
single "ldmia sp, {r0-r14}^" instruction.
I don't know what to make of it. It seems like a hardware error of some
sort.
-- Ian
More information about the freebsd-arm
mailing list