hung poudriere bulk recovery

Bryan Drewery bdrewery at FreeBSD.org
Wed Oct 28 22:10:11 UTC 2015


On 10/23/2015 9:34 AM, Russell L. Carter wrote:
> 
> Greetings,
> 
> Recently my nightly cron poudriere builds have been occasionally
> hanging.  For instance, here's last night's, with apparently no
> progress for over 10 hours:
> 
> root at terpsichore> poudriere status
> SET PORTS   JAIL            BUILD                STATUS         QUEUE
> BUILT FAIL SKIP IGNORE REMAIN TIME     LOGS
> -   default 10-stable-amd64 2015-10-22_22h30m08s parallel_build   488
>  34    0    0      0    454 10:45:56
> /ssd1/poudriere/data/logs/bulk/10-stable-amd64-default/2015-10-22_22h30m08s
> root at terpsichore>
> 

Also check 'poudriere status -b' to see per-builder status. Something
may be actually doing something. Poudriere will timeout builds after a
long time. I forget the default but it may be up to 24 hours.

> htop now shows no significant activity for the specified 3 builders:
> 
> root at terpsichore> ps xa | grep poud
> 72482  -  Is       0:00.01 /bin/sh /root/poudriere/run-poudriere-bulk
> 73202  -  S        0:04.24 sh -e /usr/local/share/poudriere/bulk.sh -f
> /root/poudriere/ports -j 10-stable-amd64
> 73347  -  S        1:55.38 sh -e /usr/local/share/poudriere/bulk.sh -f
> /root/poudriere/ports -j 10-stable-amd64
> 73352  -  I        0:00.08 sh -e /usr/local/share/poudriere/bulk.sh -f
> /root/poudriere/ports -j 10-stable-amd64
>  6119  1  S+       0:00.00 grep poud
> root at terpsichore>
> 
> If I reboot, so that the tmp zfs filesystems are unmounted, and
> manually rerun the exact same script as the previous cron'd, hung
> instance, poudriere has (so far) run to completion.

Please record 'procstat -kka' before rebooting in case this is some kind
of deadlock.

> 
> I'm not sure how to debug this, but in the interim, I'm very curious
> how I can stop the hung bulk run, and either restart it, or clean up
> the various mounted zfs filesystems and manually restart from the
> beginning w/o rebooting.  Studying the man page, it's not clear at all
> the Right Way to do this, so any pointers here would be appreciated.

Kill -TERM the main poudriere process. It will clean up children.

Beyond that you can 'poudriere jail -j NAME -p TREE -z SET -k' to clean
up any mounts leftover from a previous build.

Adding a 'poudriere kill' command is on the todo list.

> 
> I'm leaving the system untouched for now so that I can try out any
> suggestions for cleanup and restart.



-- 
Regards,
Bryan Drewery

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-ports/attachments/20151028/0ad4df23/attachment.bin>


More information about the freebsd-ports mailing list