hung poudriere bulk recovery

Russell L. Carter rcarter at pinyon.org
Fri Oct 23 16:41:46 UTC 2015


Greetings,

Recently my nightly cron poudriere builds have been occasionally
hanging.  For instance, here's last night's, with apparently no
progress for over 10 hours:

root at terpsichore> poudriere status
SET PORTS   JAIL            BUILD                STATUS         QUEUE 
BUILT FAIL SKIP IGNORE REMAIN TIME     LOGS
-   default 10-stable-amd64 2015-10-22_22h30m08s parallel_build   488 
  34    0    0      0    454 10:45:56 
/ssd1/poudriere/data/logs/bulk/10-stable-amd64-default/2015-10-22_22h30m08s
root at terpsichore>

htop now shows no significant activity for the specified 3 builders:

root at terpsichore> ps xa | grep poud
72482  -  Is       0:00.01 /bin/sh /root/poudriere/run-poudriere-bulk
73202  -  S        0:04.24 sh -e /usr/local/share/poudriere/bulk.sh -f 
/root/poudriere/ports -j 10-stable-amd64
73347  -  S        1:55.38 sh -e /usr/local/share/poudriere/bulk.sh -f 
/root/poudriere/ports -j 10-stable-amd64
73352  -  I        0:00.08 sh -e /usr/local/share/poudriere/bulk.sh -f 
/root/poudriere/ports -j 10-stable-amd64
  6119  1  S+       0:00.00 grep poud
root at terpsichore>

If I reboot, so that the tmp zfs filesystems are unmounted, and
manually rerun the exact same script as the previous cron'd, hung
instance, poudriere has (so far) run to completion.

I'm not sure how to debug this, but in the interim, I'm very curious
how I can stop the hung bulk run, and either restart it, or clean up
the various mounted zfs filesystems and manually restart from the
beginning w/o rebooting.  Studying the man page, it's not clear at all
the Right Way to do this, so any pointers here would be appreciated.

I'm leaving the system untouched for now so that I can try out any
suggestions for cleanup and restart.

Thanks,
Russell


More information about the freebsd-ports mailing list