qemu-arm-static appears to have problems with signal delivery during (at least) poudrirer-devel based cross builds of some ports with ALLOW_MAKE_JOBS=yes

Mark Millard markmi at dsl-only.net
Thu Jan 26 18:31:36 UTC 2017


On 2017-Jan-26, at 5:54 AM, Michal Meloun <melounmichal at gmail.com> wrote:

> On 26.01.2017 5:26, Mark Millard wrote:
>> On 2017-Jan-25, at 12:27 PM, Sean Bruno <sbruno at freebsd.org> wrote:
>> 
>>> Mark:
>>> 
>>> There was a recent update this week that was submitted and accepted to
>>> qemu-user-static.
>>> 
>>> Want to give it a spin again and see if you are able to make progress?
>>> 
>>> sean "top poster for maximum effect" bruno
>> 
>> I updated my /usr/ports to -r432460 (from today) and rebuilt.
>> I the tried doing some poudriere -x -a arm.armv6 port builds
>> again, with ALLOW_MAKE_JOBS=yes and -J 1 in use.
>> 
>> Unfortunately the qemu-user-static update did not fix the
>> problem I've been seeing.
>> 
>> An example extracted from a print/texinfo log still shows
>> "TCG temporary leak before 00021826":
> 
> I just rebuild print/texinfo without single problem.
> Well, with slightly different CFLAGS
> CFLAGS+= -O2 -munaligned-access -mcpu=cortex-a15 -fno-builtin-sin
> -fno-builtin-cos
> 
> Michal

I had already reported that on retries the failure point in
the overall sequence for the port either changes or the build
completes for whatever I was trying to build that initially
failed. (I did not repeat that in the new report.)

When I retried print/texinfo built okay.

I've never gotten anything large like lang/gcc6 with a full
bootstrap to complete with ALLOW_MAKE_JOBS=yes and -J 1 in use
--no where near doing so. (In my context ALLOW_MAKE_JOBS means
what portmaster would do for -j 4. Poudriere seem to give no
control of this [-J is a different issue].)

I've also not been able to use gdb on the .core produced:
a qemu_gmake.core file extracted from the compressed tar
archive of the failed work directory. file on it reports. . .

# file /root/poudriere_failure/work/.build/qemu_gmake.core
/root/poudriere_failure/work/.build/qemu_gmake.core: ELF 32-bit LSB core file ARM, version 1 (FreeBSD), FreeBSD-style, from 'ke'

(I suspect that "version 1 (FreebSD)" is not really
intended to be supported as stands.)

I submitted bugzilla 216132 as a segmentation fault report
against devel/gdb but the patch that was tried just allowed
gdb to get farther but show other problems and still fail
overall on handling qemu_gmake.core. See 216132.

===
Mark Millard
markmi at dsl-only.net

> ....
> mv warn-on-use.h-t warn-on-use.h
> /bin/mkdir -p sys
> rm -f sys/types.h-t sys/types.h && \
> { echo '/* DO NOT EDIT! GENERATED AUTOMATICALLY! */'; \
>  sed -e 's|@''GUARD_PREFIX''@|GL|g' \
>      -e 's|@''INCLUDE_NEXT''@|include_next|g' \
>      -e 's|@''PRAGMA_SYSTEM_HEADER''@|#pragma GCC system_header|g' \
>      -e 's|@''PRAGMA_COLUMNS''@||g' \
>      -e 's|@''NEXT_SYS_TYPES_H''@|<sys/types.h>|g' \
>      -e 's|@''WINDOWS_64_BIT_OFF_T''@|0|g' \
>      < ./sys_types.in.h; \
> } > sys/types.h-t && \
> mv sys/types.h-t sys/types.h
> rm -f unistd.h-t unistd.h && \
> ..
> 
> 
>> 
>> . . .
>> rm -f sys/types.h-t sys/types.h && \
>> { echo '/* DO NOT EDIT! GENERATED AUTOMATICALLY! */'; \
>>  sed -e 's|@''GUARD_PREFIX''@|GL|g' \
>>      -e 's|@''INCLUDE_NEXT''@|include_next|g' \
>>      -e 's|@''PRAGMA_SYSTEM_HEADER''@|#pragma GCC system_header|g' \
>>      -e 's|@''PRAGMA_COLUMNS''@||g' \
>>      -e 's|@''NEXT_SYS_TYPES_H''@|<sys/types.h>|g' \
>>      -e 's|@''WINDOWS_64_BIT_OFF_T''@|0|g' \
>>      < ./sys_types.in.h; \
>> } > sys/types.h-t && \
>> mv sys/types.h-t sys/types.h
>> TCG temporary leak before 00021826
>> qemu: uncaught target signal 4 (Illegal instruction) - core dumped
>> Illegal instruction
>> gmake[2]: *** [Makefile:1174: all-recursive] Error 1
>> gmake[2]: Leaving directory '/wrkdirs/usr/ports/print/texinfo/work/texinfo-6.1'
>> gmake[1]: *** [Makefile:1113: all] Error 2
>> gmake[1]: Leaving directory '/wrkdirs/usr/ports/print/texinfo/work/texinfo-6.1'
>> ===> Compilation failed unexpectedly.
>> Try to set MAKE_JOBS_UNSAFE=yes and rebuild before reporting the failure to
>> the maintainer.
>> *** Error code 1
>> 
>> Stop.
>> make: stopped in /usr/ports/print/texinfo
>> ====>> Cleaning up wrkdir
>> ===>  Cleaning for texinfo-6.1.20160425,1
>> build of print/texinfo ended at Wed Jan 25 20:08:32 PST 2017
>> build time: 00:06:57
>> !!! build failure encountered !!!
>> 
>> 
>> ===
>> Mark Millard
>> markmi at dsl-only.net
>> 
>> On 01/15/17 07:09, Mark Millard wrote:
>>> On 2017-Jan-14, at 10:53 PM, Mark Millard <markmi at dsl-only.net> wrote:
>>> 
>>>> [Context: head (12) -r312009 and ports head -r431413.]
>>>> 
>>>> I've been experimenting on amd64 with poudriere-devel with -x
>>>> for -a arm.armv6 and I ran into:
>>>> 
>>>>> TCG temporary leak before 00021826
>>>>> qemu: uncaught target signal 4 (Illegal instruction) - core dumped
>>>> 
>>>> in 3 of the 31 ports for the build, but 4 skipped so 3 of 27
>>>> attempted. The 00021826 is the same number in all the examples
>>>> so far (whatever its base).
>>>> 
>>>> These seem to be the only TCG messages and each failure starts with
>>>> one and then reports the qemu message. (Also true for the below.)
>>>> As far as I can tell the TCG notice is the report of an internal
>>>> qemu problem that is then translated into an Illegal instruction.
>>>> 
>>>> This was with ALLOW_MAKE_JOBS=yes but -J 1 for poudriere.
>>>> 
>>>> For 2 of the problem ports retries worked, still using
>>>> ALLOW_MAKE_JOBS=yes and -J 1 .
>>>> 
>>>> But the 3rd port failed each time tried with ALLOW_MAKE_JOBS=yes
>>>> --but in a different step each time.
>>>> 
>>>> In all failure cases it was gmake that got the "illegal instruction".
>>>> 
>>>> But disabling ALLOW_MAKE_JOBS=yes appears (so far) to avoid the
>>>> issue. For example, that 3rd failing port built fine. (I've
>>>> been doing more ports since, with ALLOW_MAKE_JOBS=yes repeatedly
>>>> failing and lack of it working.)
>>>> 
>>>> My guess is SIGCHLD delivery sometimes touches something (or a timing)
>>>> that is not handled well in qemu-arm-static. I've had not problems
>>>> on an rpi2 or bpim3 in the past.
>>>> 
>>>> (I have seen some analogous "soemtimes" issues on powerpc under
>>>> and version of lang that mishandled the stack part of the ABI
>>>> FreeBSD uses, SIGCHLD sometimes getting on the stack at a bad-time
>>>> for the messed up code generation, leading to stack corruption. Code
>>>> not getting signals had no problems.)
>>>> 
>>>> Note: The amd64 context is FreeBSD under VirtualBox under macOS
>>>> and it has had no problem for native builds of world, kernel,
>>>> or ports.
>>> 
>>> Avoiding ALLOW_MAKE_JOBS=yes is not sufficient to guarantee builds
>>> will work. Here is one that got near the end before failing the
>>> same way:
>>> 
>>> . . .
>>> install -m 0644 /wrkdirs/usr/ports/devel/arm-none-eabi-gcc/work/gcc-6.3.0/gcc/cp/type-utils.h /wrkdirs/usr/ports/devel/arm-none-eabi-gcc/work/stage/usr/local/lib/gcc/arm-none-eabi/6.3.0/plugin/include/cp/type-utils.h
>>> install: DONTSTRIP set - will not strip installed binaries
>>> TCG temporary leak before 00021826
>>> qemu: uncaught target signal 4 (Illegal instruction) - core dumped
>>> gmake[1]: *** [Makefile:4176: install-gcc] Illegal instruction
>>> gmake[1]: Leaving directory '/wrkdirs/usr/ports/devel/arm-none-eabi-gcc/work/.build'
>>> *** Error code 2
>>> 
>>> Stop.
>>> make: stopped in /usr/ports/devel/arm-none-eabi-gcc
>>> ====>> Cleaning up wrkdir
>>> ===>  Cleaning for arm-none-eabi-gcc-6.3.0
>>> build of devel/arm-none-eabi-gcc ended at Sun Jan 15 00:04:02 PST 2017
>>> build time: 02:52:28
>>> !!! build failure encountered !!!
>>> 
>>> 
>>> Going back to the earlier initial problem (that I happen to have the
>>> material for handy): expanding the .tbz of the failed build and finding
>>> the core showed:
>>> 
>>> # find . -name "*.core" -exec file {} \;                                                                                ./work/binutils-2.27/ld/qemu_gmake.core: ELF 32-bit LSB core file ARM, version 1 (FreeBSD), FreeBSD-style, from 'ke'
>>> 
>>> [I've not figured out what I can do with that --or how.]
>>> 
>>> 
>>> One thing unusual on my part is that I use -mcpu=cortex-a7 . That
>>> matches how I historically buildworld buildkernel for installation
>>> on the rpi2 and bpim3. I've never had problems like this with
>>> builds on the rpi2 or the bpim3 (buildworld, buildkernel, port
>>> builds). It might be that qemu-arm-static has a problem with
>>> -mcpu=cortex-a7 code that is generated --but not always.
>>> 
>>> Using the make.conf as an example:
>>> 
>>> # more /usr/local/etc/poudriere.d/head-cortex-a7-make.conf
>>> WANT_QT_VERBOSE_CONFIGURE=1
>>> #
>>> DEFAULT_VERSIONS+=perl5=5.24
>>> WITH_DEBUG=
>>> WITH_DEBUG_FILES=
>>> MALLOC_PRODUCTION=
>>> #
>>> #system clang 3.8+ (gcc6 rejects -march=armv7a):
>>> #CFLAGS+= -march=armv7-a -mcpu=cortex-a7
>>> #CXXFLAGS+= -march=armv7-a -mcpu=cortex-a7
>>> #CPPFLAGS+= -march=armv7-a -mcpu=cortex-a7
>>> #
>>> #lang/gcc6's xgcc stage considers the above conflicting so use just:
>>> CFLAGS+= -mcpu=cortex-a7
>>> CXXFLAGS+= -mcpu=cortex-a7
>>> CPPFLAGS+= -mcpu=cortex-a7
>>> 
>>> 
>>> For my context poudriere with -x for -a arm.armv6 and the use of
>>> qemu-arm-static does not look reliable enough to depend on. It is
>>> not obvious that the -x use contributes to the problem: it may well
>>> not.
>>> 
>>> ===
>>> Mark Millard
>>> markmi at dsl-only.net



More information about the freebsd-arm mailing list