qemu-arm-static appears to have problems with signal delivery during (at least) poudrirer-devel based cross builds of some ports with ALLOW_MAKE_JOBS=yes

Mark Millard markmi at dsl-only.net
Thu Jan 26 22:17:32 UTC 2017


[Top post of new information confirming SIGCHLD handling.]

lldb on an arm (bpim3) is able to interpret the qemu_gmake.core file
in a useful way when also given a copy of the gmake!

For "TCG temporary leak before 00021826" the symbol dump in addresses
order shows:

Dumping symbol table for 4 modules.
Symtab, file = /usr/local/bin/gmake, num_symbols = 957 (sorted by address):
               Debug symbol
               |Synthetic symbol
               ||Externally Visible
               |||
Index   UserID DSX Type            File Address/Value Load Address       Size               Flags      Name
------- ------ --- --------------- ------------------ ------------------ ------------------ ---------- ----------------------------------
. . .
[  538]   6121   X Code            0x0000000000021820 0x00029820 0x0000000000000038 0x00000012 child_handler
[  592]   6175   X Code            0x0000000000021858 0x00029858 0x0000000000000d7c 0x00000012 reap_children
. . .

This looks like it tends to confirm the SIGCHLD handling is involved.

And objdump on gmake shows:

00021820 <child_handler> push   {fp, lr}
00021824 <child_handler+0x4> mov        fp, sp
00021828 <child_handler+0x8> sub        sp, sp, #8
0002182c <child_handler+0xc> mov        r1, r0
00021830 <child_handler+0x10> str       r0, [sp, #4]
00021834 <child_handler+0x14> movw      r0, #36636      ; 0x8f1c
00021838 <child_handler+0x18> movt      r0, #5
0002183c <child_handler+0x1c> ldr       r2, [r0]
00021840 <child_handler+0x20> add       r2, r2, #1
00021844 <child_handler+0x24> str       r2, [r0]
00021848 <child_handler+0x28> str       r1, [sp]
0002184c <child_handler+0x2c> bl        0002e9f0 <jobserver_signal>
00021850 <child_handler+0x30> mov       sp, fp
00021854 <child_handler+0x34> pop       {fp, pc}

Interestingly 00021826 is between instructions and
lldb reported for the registers:

(lldb) register read
General Purpose Registers:
        r0 = 0x9fffc0f8
        r1 = 0x9fffc138
        r2 = 0x000a18c0
        r3 = 0xf4fde858
        r4 = 0x9fffc138
        r5 = 0xf4a00000
        r6 = 0xb6db6db7
        r7 = 0x00000012
        r8 = 0xf4a0c000
        r9 = 0xf4aa18c0
       r10 = 0x9fffc260
       r11 = 0x00000004
       r12 = 0x9fffc0f8
        sp = 0x9fffc0f8
        lr = 0x9fffffcc
        pc = 0x00021822
      cpsr = 0x80000030

i.e., the pc being 0x00021822 . That would be in the
middle of the "push   {fp, lr}" instruction and 4
bytes before the 00021826 figure.

If it really tried to fetch an instruction at
0x00021822 that likely would also explain getting a
SIGILL classification for the 4 bytes starting
there.

I have no clue how the odd-multiple-of-2 address
is getting involved. But it does appear that
sometimes signal delivery is messed up under qemu.

===
Mark Millard
markmi at dsl-only.net

On 2017-Jan-26, at 10:04 AM, Mark Millard <markmi at dsl-only.net> wrote:

> On 2017-Jan-26, at 5:54 AM, Michal Meloun <melounmichal at gmail.com> wrote:
> 
>> On 26.01.2017 5:26, Mark Millard wrote:
>>> On 2017-Jan-25, at 12:27 PM, Sean Bruno <sbruno at freebsd.org> wrote:
>>> 
>>>> Mark:
>>>> 
>>>> There was a recent update this week that was submitted and accepted to
>>>> qemu-user-static.
>>>> 
>>>> Want to give it a spin again and see if you are able to make progress?
>>>> 
>>>> sean "top poster for maximum effect" bruno
>>> 
>>> I updated my /usr/ports to -r432460 (from today) and rebuilt.
>>> I the tried doing some poudriere -x -a arm.armv6 port builds
>>> again, with ALLOW_MAKE_JOBS=yes and -J 1 in use.
>>> 
>>> Unfortunately the qemu-user-static update did not fix the
>>> problem I've been seeing.
>>> 
>>> An example extracted from a print/texinfo log still shows
>>> "TCG temporary leak before 00021826":
>> 
>> I just rebuild print/texinfo without single problem.
>> Well, with slightly different CFLAGS
>> CFLAGS+= -O2 -munaligned-access -mcpu=cortex-a15 -fno-builtin-sin
>> -fno-builtin-cos
>> 
>> Michal
> 
> I had already reported that on retries the failure point in
> the overall sequence for the port either changes or the build
> completes for whatever I was trying to build that initially
> failed. (I did not repeat that in the new report.)
> 
> When I retried print/texinfo built okay.
> 
> I've never gotten anything large like lang/gcc6 with a full
> bootstrap to complete with ALLOW_MAKE_JOBS=yes and -J 1 in use
> --no where near doing so. (In my context ALLOW_MAKE_JOBS means
> what portmaster would do for -j 4. Poudriere seem to give no
> control of this [-J is a different issue].)
> 
> I've also not been able to use gdb on the .core produced:
> a qemu_gmake.core file extracted from the compressed tar
> archive of the failed work directory. file on it reports. . .
> 
> # file /root/poudriere_failure/work/.build/qemu_gmake.core
> /root/poudriere_failure/work/.build/qemu_gmake.core: ELF 32-bit LSB core file ARM, version 1 (FreeBSD), FreeBSD-style, from 'ke'
> 
> (I suspect that "version 1 (FreebSD)" is not really
> intended to be supported as stands.)
> 
> I submitted bugzilla 216132 as a segmentation fault report
> against devel/gdb but the patch that was tried just allowed
> gdb to get farther but show other problems and still fail
> overall on handling qemu_gmake.core. See 216132.
> 
> ===
> Mark Millard
> markmi at dsl-only.net
> 
>> ....
>> mv warn-on-use.h-t warn-on-use.h
>> /bin/mkdir -p sys
>> rm -f sys/types.h-t sys/types.h && \
>> { echo '/* DO NOT EDIT! GENERATED AUTOMATICALLY! */'; \
>> sed -e 's|@''GUARD_PREFIX''@|GL|g' \
>>     -e 's|@''INCLUDE_NEXT''@|include_next|g' \
>>     -e 's|@''PRAGMA_SYSTEM_HEADER''@|#pragma GCC system_header|g' \
>>     -e 's|@''PRAGMA_COLUMNS''@||g' \
>>     -e 's|@''NEXT_SYS_TYPES_H''@|<sys/types.h>|g' \
>>     -e 's|@''WINDOWS_64_BIT_OFF_T''@|0|g' \
>>     < ./sys_types.in.h; \
>> } > sys/types.h-t && \
>> mv sys/types.h-t sys/types.h
>> rm -f unistd.h-t unistd.h && \
>> ..
>> 
>> 
>>> 
>>> . . .
>>> rm -f sys/types.h-t sys/types.h && \
>>> { echo '/* DO NOT EDIT! GENERATED AUTOMATICALLY! */'; \
>>> sed -e 's|@''GUARD_PREFIX''@|GL|g' \
>>>     -e 's|@''INCLUDE_NEXT''@|include_next|g' \
>>>     -e 's|@''PRAGMA_SYSTEM_HEADER''@|#pragma GCC system_header|g' \
>>>     -e 's|@''PRAGMA_COLUMNS''@||g' \
>>>     -e 's|@''NEXT_SYS_TYPES_H''@|<sys/types.h>|g' \
>>>     -e 's|@''WINDOWS_64_BIT_OFF_T''@|0|g' \
>>>     < ./sys_types.in.h; \
>>> } > sys/types.h-t && \
>>> mv sys/types.h-t sys/types.h
>>> TCG temporary leak before 00021826
>>> qemu: uncaught target signal 4 (Illegal instruction) - core dumped
>>> Illegal instruction
>>> gmake[2]: *** [Makefile:1174: all-recursive] Error 1
>>> gmake[2]: Leaving directory '/wrkdirs/usr/ports/print/texinfo/work/texinfo-6.1'
>>> gmake[1]: *** [Makefile:1113: all] Error 2
>>> gmake[1]: Leaving directory '/wrkdirs/usr/ports/print/texinfo/work/texinfo-6.1'
>>> ===> Compilation failed unexpectedly.
>>> Try to set MAKE_JOBS_UNSAFE=yes and rebuild before reporting the failure to
>>> the maintainer.
>>> *** Error code 1
>>> 
>>> Stop.
>>> make: stopped in /usr/ports/print/texinfo
>>> ====>> Cleaning up wrkdir
>>> ===>  Cleaning for texinfo-6.1.20160425,1
>>> build of print/texinfo ended at Wed Jan 25 20:08:32 PST 2017
>>> build time: 00:06:57
>>> !!! build failure encountered !!!
>>> 
>>> 
>>> ===
>>> Mark Millard
>>> markmi at dsl-only.net
>>> 
>>> On 01/15/17 07:09, Mark Millard wrote:
>>>> On 2017-Jan-14, at 10:53 PM, Mark Millard <markmi at dsl-only.net> wrote:
>>>> 
>>>>> [Context: head (12) -r312009 and ports head -r431413.]
>>>>> 
>>>>> I've been experimenting on amd64 with poudriere-devel with -x
>>>>> for -a arm.armv6 and I ran into:
>>>>> 
>>>>>> TCG temporary leak before 00021826
>>>>>> qemu: uncaught target signal 4 (Illegal instruction) - core dumped
>>>>> 
>>>>> in 3 of the 31 ports for the build, but 4 skipped so 3 of 27
>>>>> attempted. The 00021826 is the same number in all the examples
>>>>> so far (whatever its base).
>>>>> 
>>>>> These seem to be the only TCG messages and each failure starts with
>>>>> one and then reports the qemu message. (Also true for the below.)
>>>>> As far as I can tell the TCG notice is the report of an internal
>>>>> qemu problem that is then translated into an Illegal instruction.
>>>>> 
>>>>> This was with ALLOW_MAKE_JOBS=yes but -J 1 for poudriere.
>>>>> 
>>>>> For 2 of the problem ports retries worked, still using
>>>>> ALLOW_MAKE_JOBS=yes and -J 1 .
>>>>> 
>>>>> But the 3rd port failed each time tried with ALLOW_MAKE_JOBS=yes
>>>>> --but in a different step each time.
>>>>> 
>>>>> In all failure cases it was gmake that got the "illegal instruction".
>>>>> 
>>>>> But disabling ALLOW_MAKE_JOBS=yes appears (so far) to avoid the
>>>>> issue. For example, that 3rd failing port built fine. (I've
>>>>> been doing more ports since, with ALLOW_MAKE_JOBS=yes repeatedly
>>>>> failing and lack of it working.)
>>>>> 
>>>>> My guess is SIGCHLD delivery sometimes touches something (or a timing)
>>>>> that is not handled well in qemu-arm-static. I've had not problems
>>>>> on an rpi2 or bpim3 in the past.
>>>>> 
>>>>> (I have seen some analogous "soemtimes" issues on powerpc under
>>>>> and version of lang that mishandled the stack part of the ABI
>>>>> FreeBSD uses, SIGCHLD sometimes getting on the stack at a bad-time
>>>>> for the messed up code generation, leading to stack corruption. Code
>>>>> not getting signals had no problems.)
>>>>> 
>>>>> Note: The amd64 context is FreeBSD under VirtualBox under macOS
>>>>> and it has had no problem for native builds of world, kernel,
>>>>> or ports.
>>>> 
>>>> Avoiding ALLOW_MAKE_JOBS=yes is not sufficient to guarantee builds
>>>> will work. Here is one that got near the end before failing the
>>>> same way:
>>>> 
>>>> . . .
>>>> install -m 0644 /wrkdirs/usr/ports/devel/arm-none-eabi-gcc/work/gcc-6.3.0/gcc/cp/type-utils.h /wrkdirs/usr/ports/devel/arm-none-eabi-gcc/work/stage/usr/local/lib/gcc/arm-none-eabi/6.3.0/plugin/include/cp/type-utils.h
>>>> install: DONTSTRIP set - will not strip installed binaries
>>>> TCG temporary leak before 00021826
>>>> qemu: uncaught target signal 4 (Illegal instruction) - core dumped
>>>> gmake[1]: *** [Makefile:4176: install-gcc] Illegal instruction
>>>> gmake[1]: Leaving directory '/wrkdirs/usr/ports/devel/arm-none-eabi-gcc/work/.build'
>>>> *** Error code 2
>>>> 
>>>> Stop.
>>>> make: stopped in /usr/ports/devel/arm-none-eabi-gcc
>>>> ====>> Cleaning up wrkdir
>>>> ===>  Cleaning for arm-none-eabi-gcc-6.3.0
>>>> build of devel/arm-none-eabi-gcc ended at Sun Jan 15 00:04:02 PST 2017
>>>> build time: 02:52:28
>>>> !!! build failure encountered !!!
>>>> 
>>>> 
>>>> Going back to the earlier initial problem (that I happen to have the
>>>> material for handy): expanding the .tbz of the failed build and finding
>>>> the core showed:
>>>> 
>>>> # find . -name "*.core" -exec file {} \;                                                                                ./work/binutils-2.27/ld/qemu_gmake.core: ELF 32-bit LSB core file ARM, version 1 (FreeBSD), FreeBSD-style, from 'ke'
>>>> 
>>>> [I've not figured out what I can do with that --or how.]
>>>> 
>>>> 
>>>> One thing unusual on my part is that I use -mcpu=cortex-a7 . That
>>>> matches how I historically buildworld buildkernel for installation
>>>> on the rpi2 and bpim3. I've never had problems like this with
>>>> builds on the rpi2 or the bpim3 (buildworld, buildkernel, port
>>>> builds). It might be that qemu-arm-static has a problem with
>>>> -mcpu=cortex-a7 code that is generated --but not always.
>>>> 
>>>> Using the make.conf as an example:
>>>> 
>>>> # more /usr/local/etc/poudriere.d/head-cortex-a7-make.conf
>>>> WANT_QT_VERBOSE_CONFIGURE=1
>>>> #
>>>> DEFAULT_VERSIONS+=perl5=5.24
>>>> WITH_DEBUG=
>>>> WITH_DEBUG_FILES=
>>>> MALLOC_PRODUCTION=
>>>> #
>>>> #system clang 3.8+ (gcc6 rejects -march=armv7a):
>>>> #CFLAGS+= -march=armv7-a -mcpu=cortex-a7
>>>> #CXXFLAGS+= -march=armv7-a -mcpu=cortex-a7
>>>> #CPPFLAGS+= -march=armv7-a -mcpu=cortex-a7
>>>> #
>>>> #lang/gcc6's xgcc stage considers the above conflicting so use just:
>>>> CFLAGS+= -mcpu=cortex-a7
>>>> CXXFLAGS+= -mcpu=cortex-a7
>>>> CPPFLAGS+= -mcpu=cortex-a7
>>>> 
>>>> 
>>>> For my context poudriere with -x for -a arm.armv6 and the use of
>>>> qemu-arm-static does not look reliable enough to depend on. It is
>>>> not obvious that the -x use contributes to the problem: it may well
>>>> not.
>>>> 
>>>> ===
>>>> Mark Millard
>>>> markmi at dsl-only.net



More information about the freebsd-arm mailing list