Re: poudriere job && find jobs which received signal 11

From: Matthias Apitz <guru_at_unixarea.de>
Date: Wed, 18 Oct 2023 16:19:51 UTC
El día miércoles, octubre 18, 2023 a las 12:10:27p. m. +0200, Alexander Leidinger escribió:

> Am 2023-10-18 09:54, schrieb Matthias Apitz:
> > Hello,
> > 
> > I'm compiling with poudriere on 14.0-CURRENT 1400094 amd64 "my" ports,
> > from git October 14, 2023. In the last two day 2229 packages were
> > produced fine, on job failed (p5-Gtk2-1.24993_3 for been known broken).
> > 
> > This morning I was looking for something in /var/log/messages and
> > accidentally I detected that yesterday a few compilations failed:
> > 
> > # grep 'signal 11' /var/log/messages | grep -v conftest
> > Oct 17 10:58:02 jet kernel: pid 12765 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > Oct 17 10:59:32 jet kernel: pid 27104 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > Oct 17 12:07:38 jet kernel: pid 85640 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > Oct 17 12:08:17 jet kernel: pid 94451 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > Oct 17 12:36:01 jet kernel: pid 77914 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > 
> > As I said, without that any of the 2229 jobs were failing:
> > 
> > # cd /usr/local/poudriere/data/logs/bulk/140-CURRENT-ports20231014/latest-per-pkg
> > # ls -C1  | wc -l
> >     2229
> > # grep -l 'build failure' *
> > p5-Gtk2-1.24993_3.log
> > 
> > How this is possible, that the make engines didn't failing? The uid
> 
> That can be part of configure runs which try to test some features.
> 
> > 65534 is the one used by poudriere, can I use the jid 24 somehow to find
> > the job which received the signal 11? Or is the time the only way to
> 
> jid = jail ID, the first column in the output of "jls". If you have the
> ...

Thanks for the detailed explanation and hints. I don't have logged the stdout of
the poudriere, I only have the build logs of all 2229 jobs. I managed to
identify the 47 builds which where running at that time between 10:00 and 
13:00 (with some grep commands, cutting away all builds which ended
before 10:00, and then all which started after 13:00). I run the build
for the 47 ports again, one after the other with only one builder. The
culprit seems to be lang/gcc10 which is still running at the moment of typing
but already produce again two times:

Oct 18 17:44:45 jet kernel: pid 21011 (cc1plus), jid 169, uid 65534: exited on signal 11 (core dumped)
Oct 18 17:45:17 jet kernel: pid 30102 (cc1plus), jid 169, uid 65534: exited on signal 11 (core dumped)

Will dig into its build log later ...

Yours

	matthias

-- 
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub