qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling))

From: Guido Falsi <mad_at_madpilot.net>
Date: Sun, 28 Jan 2024 14:15:56 UTC
Hi all, again,

I have some more findings about this, I'm top posting because the old 
message is not really that much relevant anymore.

I'm now running a machine with head (commit 
b32d49cfbaa0437d08e65e7cd7c82c5951b1a852 Jan 25th), poudriere installed 
in it, machine is amd64, with an arm64 jail, 14.0-RELEASE, installed 
from official distribution binaries (https download method), with cross 
tools.

To make sure everything is aligned I rebuild everything: updated head, 
rebuild cross tools in the jail, recompiled all ports for the host 
architecture and force reinstalled them, especially qemu-user-static, 
cleaned up all packages for the arm64 jail.

If I missed something important please point it out.

I have made some more tests and I'm getting python failures in poudriere 
like the one described below from time to time (don't have hard stats 
but feels like 50% chance). If I get past that it usually is able to 
build all the not many packages, but locks up at:

Creating repository in /tmp/packages:   0%


with nCPUs processes like this:

 > ps -ax | grep -i pkg
91287  1  I+J      0:00.02 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91288  1  I+J      0:00.02 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91289  1  I+J      0:00.00 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91290  1  I+J      0:00.00 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91291  1  I+J      0:00.00 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91292  1  I+J      0:00.00 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91293  1  I+J      0:00.00 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91294  1  I+J      0:00.00 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91295  1  I+J      0:00.00 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91296  1  I+J      0:00.00 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91297  1  I+J      0:00.00 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91298  1  I+J      0:00.00 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91299  1  I+J      0:00.00 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key
91300  1  I+J      0:00.00 /usr/local/bin/qemu-aarch64-static 
/.p/pkg-static repo -o /tmp/packages /packages /tmp/repo.key


And this has hit me 100% of the time up to now.

Looks like it is pkg spawning ncpu processes, I'm looking at reducing 
them, just in case this can sidestep the race/lockup.


My suspect is there is some race with quemu-user-static or the APIs it 
is using, that is triggered by pkg-repo.

How can I investigate this? I'm able to reproduce it 100% of the time.

BTW these are the pkgs I'm building at present:

dns/unbound
net-mgmt/vmutils
net/kea
sysutils/htop
sysutils/node_exporter
sysutils/tmux


(vmutils and node_exporter are go packages and are being skipped since 
go fails, but I keep them in the list, since I can grab binaries from 
the official repos, htop I'm going to drop in the near future)



Thanks in advance, any help appreciated, especially any suggestions for 
where to look at and investigation to understand if this is a local 
problem, or some issue with base/qemu.


On 24/01/24 22:10, Guido Falsi wrote:
> Hi,
> 
> I recently see a strange failure with python 3.9 in poudriere, it was 
> not happening a few weeks ago.
> 
> I'm building in poudriere on a head machine running amd64, with a 
> poudriere jail for arm64, via qemu-user-static. The jail is running 14.0.
> 
> I'm not sure what is going on.
> 
> It fails in the packaging phase with a bunch of errors like:
> 
> ===========================================================================
> =======================<phase: package        >============================
> ===== env: 'PKG_NOTES=build_timestamp ports_top_git_hash 
> ports_top_checkout_unclean port_git_hash port_checkout_unclean built_by' 
> 'PKG_NOTE_build_timestamp=2024-01-24T17:07:52+0000' 
> 'PKG_NOTE_ports_top_git_hash=0816fdcb6ce8' 
> 'PKG_NOTE_ports_top_checkout_unclean=no' 
> 'PKG_NOTE_port_git_hash=0816fdcb6ce8' 
> 'PKG_NOTE_port_checkout_unclean=no' 
> 'PKG_NOTE_built_by=poudriere-git-3.4.1' NO_DEPENDS=yes USER=root UID=0 
> GID=0
> ===>  Building packages for python39-3.9.18
> ===>   Building python39-3.9.18
> pkg-static: Unable to access file 
> /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/imaplib.cpython-39.opt-2.pyc:No such file or directory
> pkg-static: Unable to access file 
> /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/imghdr.cpython-39.opt-2.pyc:No such file or directory
> pkg-static: Unable to access file 
> /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/imp.cpython-39.opt-2.pyc:No such file or directory
> pkg-static: Unable to access file 
> /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/inspect.cpython-39.opt-2.pyc:No such file or directory
> pkg-static: Unable to access file 
> /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/io.cpython-39.opt-2.pyc:No such file or directory
> pkg-static: Unable to access file 
> /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/ipaddress.cpython-39.opt-2.pyc:No such file or directory
> pkg-static: Unable to access file 
> /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/mailbox.cpython-39.opt-2.pyc:No such file or directory
> pkg-static: Unable to access file 
> /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/mailcap.cpython-39.opt-2.pyc:No such file or directory
> pkg-static: Unable to access file 
> /wrkdirs/usr/ports/lang/python39/work/stage/usr/local/lib/python3.9/__pycache__/mimetypes.cpython-39.opt-2.pyc:No such file or directory
> 
> 
> 
> (it's all about 'opt-2.pyc' files)
> 
> 
> What could have changed? Maybe I'm doing something wrong? Maybe I'm 
> hitting some qemu-user-static issue on head?
> 
> 
> Any help appreciated.
> 
> 
> (full log available if needed)
> 

-- 
Guido Falsi <mad@madpilot.net>