Arm64 stack issues (was Re: FreeBSD status for/on ODroid-C2?)
Mark Millard
markmigm at gmail.com
Wed Feb 1 02:39:08 UTC 2017
[Show .core file creation times instead.]
On 2017-Jan-31, at 6:30 PM, Mark Millard <markmi at dsl-only.net> wrote:
> [Just adding more accurate/precise times for the .core files.]
> [The original was accidentally sent from the "wrong" E-mail account
> but I've adjusted that here.]
>
> On 2017-Jan-31, at 12:35 PM, Mark Millard <markmi at dsl-only.net> wrote:
>
>> [More notes on what I observe on a pine64 from head -r312982 .]
>>
>> On 2017-Jan-28, at 2:17 PM, Tom Vijlbrief <tvijlbrief at gmail.com> wrote:
>>
>>> Note that on the pine64 the network interface hangs from time to time and I get a core dump with very low frequency from long running processes, eg the shell that invokes "make world".
>>
>> I got sh crashes (multiple processes in the same time frame) from
>> just trying to build pkg:
>>
>> make[5]: stopped in /usr/obj/portswork/usr/ports/ports-mgmt/pkg/work/pkg-1.9.4/libpkg
>> *** [all-recursive] Error code 1
>>
>> # ls -lt /var/crash/
>> total 41764
>> -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.13676.core
>> -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.13511.core
>> -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.13499.core
>> -rw------- 1 root wheel 4702208 Jan 31 03:15 sh.12095.core
>> -rw-r--r-- 1 root wheel 5 Nov 3 10:18 minfree
>>
>> In all the crashes lldb on the .core shows that the pc was no longer
>> pointing a memory with code in it. It is interesting that all
>> 4 sh instances died at about the same time.
>
> More time detail (using -T):
>
> -rw------- 1 root wheel 4702208 Jan 31 03:15:44 2017 sh.13676.core
> -rw------- 1 root wheel 4702208 Jan 31 03:15:43 2017 sh.13511.core
> -rw------- 1 root wheel 4702208 Jan 31 03:15:42 2017 sh.13499.core
> -rw------- 1 root wheel 4702208 Jan 31 03:15:32 2017 sh.12095.core
I should have used creation times:
# ls -UTlt /var/crash/
. . .
-rw------- 1 root wheel 4702208 Jan 31 03:15:42 2017 sh.13676.core
-rw------- 1 root wheel 4702208 Jan 31 03:15:41 2017 sh.13511.core
-rw------- 1 root wheel 4702208 Jan 31 03:15:41 2017 sh.13499.core
-rw------- 1 root wheel 4702208 Jan 31 03:15:30 2017 sh.12095.core
>> SIGILL, SIGSEGV, SIGBUS, and SIGILL (again) from the non-code
>> consequences.
>>
>> The two SIGILL's have some interesting similarities to each other.
>> So I list them first below. x0-x3, x8-x9, x13, x17, x27, and cpsr
>> all match in these two. x1=ld-elf.so.1`_rtld_tlsdesc,
>> x17=libc.so.7`__free at jemalloc_jemalloc.c:2007,
>> x23=ld-elf.so.1`symlook_global + 124 at rtld.c:3916,
>> x27=sh..bss + 6336.
>>
>> The other two have the following in common:
>> x10-x12, x16-x17. x17=libc.so.7`close at close.c:48 .
>>
>> x18 = 0xaaaaaaaaaaaaaaab is common between one SIGILL and one not.
>>
>> Only one does not have x27=sh..bss + 6336. It instead has:
>> x28=sh..bss + 6336 .
>>
>> (lldb) bt
>> * thread #1: tid = 100142, 0x000000004044f800, name = 'sh', stop reason = signal SIGILL
>> * frame #0: 0x000000004044f800
>> (lldb) register read
>> General Purpose Registers:
>> x0 = 0x0000000000000000
>> x1 = 0x00000000404346e8 ld-elf.so.1`_rtld_tlsdesc
>> x2 = 0x0000000040a00000
>> x3 = 0x0000000000000002
>> x4 = 0x0000000000000050
>> x5 = 0x0000000040a4c9c0
>> x6 = 0x2e2e2f2e2e2f2e2e
>> x7 = 0x6c6f6f7462696c2f
>> x8 = 0x0000000000000001
>> x9 = 0x0000000000000000
>> x10 = 0x00000000000000df
>> x11 = 0x000000000000002f
>> x12 = 0x0000000040a0e690
>> x13 = 0x0000000000000427
>> x14 = 0x0000000000000001
>> x15 = 0x0000000000000000
>> x16 = 0x0000000000432340
>> x17 = 0x000000004054cd00 libc.so.7`__free at jemalloc_jemalloc.c:2007
>> x18 = 0x0000000000000000
>> x19 = 0x000000004044e330
>> x20 = 0x000000001c93deed
>> x21 = 0x0000000007ab9b5c
>> x22 = 0x00000000404ba7b0
>> x23 = 0x000000004043c4b0 ld-elf.so.1`symlook_global + 124 at rtld.c:3916
>> x24 = 0x0000ffffffffd2d0
>> x25 = 0x0000ffffffffd370
>> x26 = 0x0000ffffffffd340
>> x27 = 0x0000000000434000 sh..bss + 6336
>> x28 = 0x0000000040a4c1b0
>> fp = 0x0000ffff00000001
>> lr = 0x000000004044f800
>> sp = 0x0000ffffffffd2a0
>> pc = 0x000000004044f800
>> cpsr = 0x60000000
>> (lldb) disass
>> -> 0x4044f800: .long 0xd550b87a ; unknown opcode
>> 0x4044f804: .long 0x00000000 ; unknown opcode
>> 0x4044f808: .long 0x00000001 ; unknown opcode
>> 0x4044f80c: .long 0x00000000 ; unknown opcode
>> 0x4044f810: .long 0x4044fc00 ; unknown opcode
>> 0x4044f814: .long 0x00000000 ; unknown opcode
>> 0x4044f818: .long 0x4044f410 ; unknown opcode
>> 0x4044f81c: .long 0x00000000 ; unknown opcode
>>
>> (lldb) thread list
>> Process 0 stopped
>> * thread #1: tid = 100161, 0x0000ffffffffee68, name = 'sh', stop reason = signal SIGILL
>> (lldb) register read
>> General Purpose Registers:
>> x0 = 0x0000000000000000
>> x1 = 0x00000000404346e8 ld-elf.so.1`_rtld_tlsdesc
>> x2 = 0x0000000040a00000
>> x3 = 0x0000000000000002
>> x4 = 0x0000000000000017
>> x5 = 0x00080002a0290a00
>> x6 = 0x0000000000434c28 sh..bss + 9448
>> x7 = 0x000000000005e1cd
>> x8 = 0x0000000000000001
>> x9 = 0x0000000000000000
>> x10 = 0x0000000000000000
>> x11 = 0x0000000040a5c000
>> x12 = 0x0000000040a0e670
>> x13 = 0x0000000000000427
>> x14 = 0x000000000000000d
>> x15 = 0x0000000000432740 sh..bss + 0
>> x16 = 0x0000000000432340
>> x17 = 0x000000004054cd00 libc.so.7`__free at jemalloc_jemalloc.c:2007
>> x18 = 0xaaaaaaaaaaaaaaab
>> x19 = 0x0000ffffffffee18
>> x20 = 0x0000ffffffffedb4
>> x21 = 0x0000ffffffffed80
>> x22 = 0x0000ffffffffed59
>> x23 = 0x0000ffffffffed47
>> x24 = 0x0000ffffffffed38
>> x25 = 0x0000ffffffffed28
>> x26 = 0x0000ffffffffed20
>> x27 = 0x0000000000434000 sh..bss + 6336
>> x28 = 0x0000000040a803a0
>> fp = 0x0000ffffffffee59
>> lr = 0x0000ffffffffee68
>> sp = 0x0000ffffffffe1a0
>> pc = 0x0000ffffffffee68
>> cpsr = 0x60000000
>> (lldb) disass
>> -> 0xffffffffee68: .long 0x44504d54 ; unknown opcode
>> 0xffffffffee6c: .long 0x2f3d5249 ; unknown opcode
>> 0xffffffffee70: .long 0x00706d74 ; unknown opcode
>> 0xffffffffee74: .long 0x4c454853 ; unknown opcode
>> 0xffffffffee78: .long 0x622f3d4c ; unknown opcode
>> 0xffffffffee7c: .long 0x732f6e69 ; unknown opcode
>> 0xffffffffee80: .long 0x4f430068 ; unknown opcode
>> 0xffffffffee84: .long 0x4749464e ; unknown opcode
>>
>> (lldb) bt
>> * thread #1: tid = 100088, 0x356c7265702f676e, name = 'sh', stop reason = signal SIGBUS
>> * frame #0: 0x356c7265702f676e
>> (lldb) register read
>> General Purpose Registers:
>> x0 = 0x0000000000000000
>> x1 = 0x0000000000000000
>> x2 = 0x0000000040a00000
>> x3 = 0x0000000000000005
>> x4 = 0x0000000000000038
>> x5 = 0x0000000040a754e5
>> x6 = 0x584946455250442d
>> x7 = 0x6c2f7273752f223d
>> x8 = 0x0000000000000000
>> x9 = 0x0000000000000000
>> x10 = 0x0000000000434000 sh..bss + 6336
>> x11 = 0x0000000000000000
>> x12 = 0x0000000000434217 sh..bss + 6871
>> x13 = 0x0000000000434000 sh..bss + 6336
>> x14 = 0x0000000000432000 sh`__frame_dummy_init_array_entry
>> x15 = 0x000000000000003d
>> x16 = 0x00000000004322b0
>> x17 = 0x000000004050d090 libc.so.7`close at close.c:48
>> x18 = 0xaaaaaaaaaaaaaaab
>> x19 = 0x766564206f666e69
>> x20 = 0x7865646e692f746e
>> x21 = 0x69727020676b702f
>> x22 = 0x746d676d2d737472
>> x23 = 0x6f7020656d69746e
>> x24 = 0x75722d7478657474
>> x25 = 0x65672f6c65766564
>> x26 = 0x206e6f7369622f6c
>> x27 = 0x0000000040a53716
>> x28 = 0x0000000000434000 sh..bss + 6336
>> fp = 0x616c20346d2f6c65
>> lr = 0x356c7265702f676e
>> sp = 0x0000ffffffffe740
>> pc = 0x356c7265702f676e
>> cpsr = 0x20000000
>>
>> (lldb) disass
>> error: core file does not contain 0x356c7265702f676e
>> error: Failed to disassemble memory at 0xffffffffffffffff.
>>
>>
>>
>> (lldb) bt
>> * thread #1: tid = 100186, 0x0000000000000000, name = 'sh', stop reason = signal SIGSEGV
>> * frame #0: 0x0000000000000000
>> (lldb) disass
>> error: core file does not contain 0x0
>> error: Failed to disassemble memory at 0xffffffffffffffff.
>> (lldb) register read
>> General Purpose Registers:
>> x0 = 0x0000000000000000
>> x1 = 0x0000000000000000
>> x2 = 0x0000000000000002
>> x3 = 0x0000000000006c6f
>> x4 = 0x0000000040a50bb3
>> x5 = 0x0000000040a499ba
>> x6 = 0x6f7462696c2f2e2e
>> x7 = 0x6c6f6f7462696c2f
>> x8 = 0x0000000000000000
>> x9 = 0x0000000000000000
>> x10 = 0x0000000000434000 sh..bss + 6336
>> x11 = 0x0000000000000000
>> x12 = 0x0000000040a499f8
>> x13 = 0x0000000000434000 sh..bss + 6336
>> x14 = 0x0000000000000001
>> x15 = 0x0000000000000000
>> x16 = 0x00000000004322b0
>> x17 = 0x000000004050d090 libc.so.7`close at close.c:48
>> x18 = 0x0000000000000000
>> x19 = 0x0000000000000065
>> x20 = 0x0000000000000065
>> x21 = 0x00000000004168f0 sh`readtoken1 + 5212 at parser.c:1602
>> x22 = 0x0000ffffffffda90
>> x23 = 0x0000000040a498c0
>> x24 = 0x000000000000000a
>> x25 = 0x0000000000000000
>> x26 = 0x0000000000000000
>> x27 = 0x0000000040a49258
>> x28 = 0x0000000000434000 sh..bss + 6336
>> fp = 0x0000ffffffffda08
>> lr = 0x0000000000000000
>> sp = 0x0000ffffffffd970
>> pc = 0x0000000000000000
>> cpsr = 0x20000000
>>
>>
>> Looks to me like something major is wrong.
===
Mark Millard
markmi at dsl-only.net
On 2017-Jan-30, at 11:57 PM, Mark Millard <markmi at dsl-only.net> wrote:
> I updated to head -r312982 on the pine64 that I have access to:
>
> # uname -apKU
> FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r312982M arm64 aarch64 1200020 1200020
>
> after several months of not using the pine64.
> ( -mcpu=cortex-a53 used for buildworld buildkernel;
> non-debug variant of GENERIC [GENERIC included
> then overridden]; usb SSD root file system)
>
> I find that any time some of the cores are busy I get thousands
> of the gic0 spurious interrupt messages in fairly sort order.
> (This is not new: it is unchanged.)
>
> For example during either of:
>
> openssl speed
>
> or:
>
> cp /dev/zero /dev/null
> (similarly for copying actual files around,
> local or nfs involved)
>
> Once the cores are no longer busy the gic0 messages stop.
>
> The "on CPU<?>" varies. The "last irq: <?>" varies.
> (But 27 is the most common by far.)
===
Mark Millard
markmi at dsl-only.net
On 2017-Jan-28, at 2:17 PM, Tom Vijlbrief <tvijlbrief at gmail.com> wrote:
Note that on the pine64 the network interface hangs from time to time and I get a core dump with very low frequency from long running processes, eg the shell that invokes "make world". Note that I had similar issues on the ODroid-C2.
Currently rebuilding world without MALLOC_PRODUCTION.
The arm64 port is getting close to working 100%, just a last few glitches.
Op 22:03 ZA 28 Jan 2017 schreef Mark Millard <markmi at dsl-only.net>:
[About: "gic0: Spurious interrupt detected" on armv6 as well.]
On 2017-Jan-28, at 6:43 AM, Tom Vijlbrief <tvijlbrief at gmail.com> wrote:
> Did a build/install world/kernel with r312916 and MALLOC_PRODUCTION=YES on
> a pine64, removed /etc/malloc.conf, rebooted
>
> and I am now rebuilding the python2 port without problems so far (except
> the "gic0: Spurious interrupt detected" messages which reappeared shortly
> after my previous post)
While very rare, I have seen the gic0 notices on armv6 (e.g., a bpim3)
during large builds (with -j 4). Recently I got a:
gic0: Spurious interrupt detected: last irq: 29 on CPU1
on:
# uname -apKU
FreeBSD bpim3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r312726M: Tue Jan 24 20:57:48 PST 2017 markmi at FreeBSDx64:/usr/obj/bpim3_clang/arm.armv6/usr/src/sys/BPIM3-NODBG arm armv6 1200020 1200020
while building devel/gcc6 (via a full bootstrap) via -j 4 .
This is from a non-debug buildworld buildkernel context and has MALLOC_PRODUCTION=
in /etc/make.conf . No /etc/malloc.conf present. I do use -mcpu=cortex-a7 .
Details if you care:
# more /usr/src/sys/arm/conf/BPIM3-NODBG
#
# BPIM3 -- Custom configuration for the Banana Pi M3
#
include "GENERIC"
ident BPIM3-NODBG
makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols
options ALT_BREAK_TO_DEBUGGER
options KDB # Enable kernel debugger support
# For minimum debugger support (stable branch) use:
options KDB_TRACE # Print a stack trace for a panic
options DDB # Enable the kernel debugger
# Extra stuff:
#options VERBOSE_SYSINIT # Enable verbose sysinit messages
#options BOOTVERBOSE=1
#options BOOTHOWTO=RB_VERBOSE
#options KTR
#options KTR_MASK=KTR_TRAP
##options KTR_CPUMASK=0xF
#options KTR_VERBOSE
# Disable any extra checking for. . .
nooptions DEADLKRES # Enable the deadlock resolver
nooptions INVARIANTS # Enable calls of extra sanity checking
nooptions INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS
nooptions WITNESS # Enable checks to detect deadlocks and cycles
nooptions WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed
nooptions DIAGNOSTIC
It was a from cross build for buildworld buildkernel :
(I've not checked on lldb builds linking recently.)
# more ~/src.configs/src.conf.bpim3-clang-bootstrap.amd64-host
TO_TYPE=armv6
#
KERNCONF=BPIM3-NODBG
TARGET=arm
.if ${.MAKE.LEVEL} == 0
TARGET_ARCH=${TO_TYPE}
.export TARGET_ARCH
.endif
#
WITH_CROSS_COMPILER=
WITHOUT_SYSTEM_COMPILER=
#
#CPUTYPE=soft
WITH_LIBCPLUSPLUS=
WITH_BINUTILS_BOOTSTRAP=
WITH_CLANG_BOOTSTRAP=
WITH_CLANG=
WITH_CLANG_IS_CC=
WITH_CLANG_FULL=
WITH_CLANG_EXTRAS=
WITH_LLD=
#
# Linking lldb fails for armv6(/v7)
WITHOUT_LLDB=
#
WITH_BOOT=
WITHOUT_LIB32=
WITHOUT_LIBSOFT=
#
WITHOUT_ELFTOOLCHAIN_BOOTSTRAP=
WITHOUT_GCC_BOOTSTRAP=
WITHOUT_GCC=
WITHOUT_GCC_IS_CC=
WITHOUT_GNUCXX=
#
NO_WERROR=
#WERROR=
MALLOC_PRODUCTION=
#
WITH_REPRODUCIBLE_BUILD=
WITH_DEBUG_FILES=
#
XCFLAGS+= -mcpu=cortex-a7
XCXXFLAGS+= -mcpu=cortex-a7
# There is no XCPPFLAGS but XCPP gets XCFLAGS content.
Used for buildworld buildkernel :
# more ~/src.configs/make.conf
#MALLOC_PRODUCTION=
#NO_WERROR=
#WERROR=
CFLAGS.gcc+= -v
Used for port builds:
# more /etc/make.conf
WANT_QT_VERBOSE_CONFIGURE=1
#
DEFAULT_VERSIONS+=perl5=5.24
WRKDIRPREFIX=/usr/obj/portswork
WITH_DEBUG=
WITH_DEBUG_FILES=
MALLOC_PRODUCTION=
# svnlite status /usr/src/ | sort
? /usr/src/sys/amd64/conf/GENERIC-DBG
? /usr/src/sys/amd64/conf/GENERIC-NODBG
? /usr/src/sys/arm/conf/BPIM3-DBG
? /usr/src/sys/arm/conf/BPIM3-NODBG
? /usr/src/sys/arm/conf/RPI2-DBG
? /usr/src/sys/arm/conf/RPI2-NODBG
? /usr/src/sys/arm64/conf/GENERIC-DBG
? /usr/src/sys/arm64/conf/GENERIC-NODBG
? /usr/src/sys/powerpc/conf/GENERIC64vtsc-DBG
? /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG
? /usr/src/sys/powerpc/conf/GENERICvtsc-DBG
? /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG
M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td
M /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp
M /usr/src/lib/csu/powerpc64/Makefile
M /usr/src/libexec/rtld-elf/Makefile
M /usr/src/sys/boot/ofw/Makefile.inc
M /usr/src/sys/boot/powerpc/Makefile.inc
M /usr/src/sys/boot/powerpc/kboot/Makefile
M /usr/src/sys/boot/uboot/Makefile.inc
M /usr/src/sys/conf/kern.mk
M /usr/src/sys/conf/kmod.mk
M /usr/src/sys/ddb/db_main.c
M /usr/src/sys/ddb/db_script.c
M /usr/src/sys/modules/zfs/Makefile
M /usr/src/sys/powerpc/ofw/ofw_machdep.c
The M's are generally tied to powerpc64 and powerpc
explorations. I tend to use the same source for all
the TARGET_ARCH's that I build.
===
Mark Millard
markmi at dsl-only.net
_______________________________________________
freebsd-arm at freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe at freebsd.org"
More information about the freebsd-arm
mailing list