Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, . . .] [Update to Host OSVERSION 1500018 did not help]
Date: Thu, 09 May 2024 00:28:55 UTC
On 2024-05-08 23:53:57 (+0800), Mark Millard wrote: > On Apr 29, 2024, at 20:16, Mark Millard <marklmi@yahoo.com> wrote: > >> On Apr 29, 2024, at 20:11, Mark Millard <marklmi@yahoo.com> wrote: >> >>> On Apr 29, 2024, at 19:54, Mark Millard <marklmi@yahoo.com> wrote: >>> >>>> On Apr 28, 2024, at 18:06, Philip Paeps <philip@freebsd.org> wrote: >>>> >>>>> On 2024-04-18 23:14:22 (+0800), Mark Millard wrote: >>>>>> On Apr 18, 2024, at 08:02, Mark Millard <marklmi@yahoo.com> >>>>>> wrote: >>>>>>> void <void_at_f-m.fm> wrote on >>>>>>> Date: Thu, 18 Apr 2024 14:08:36 UTC : >>>>>>> >>>>>>>> Not sure where to post this.. >>>>>>>> >>>>>>>> The last bulk build for arm64 appears to have happened around >>>>>>>> mid-March on ampere2. Is it broken? >>>>>>> >>>>>>> main-armv7 building is broken and the last completed build >>>>>>> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It >>>>>>> gets stuck making no progress until manually forced to stop, >>>>>>> which leads to huge elapsed times for the incomplete builds: >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>> My guess is that FreeBSD has something that broken after >>>>>>> bd45bbe440 >>>>>>> that was broken as of f5f08e41aa and was still broken at >>>>>>> 75464941dc . >>>>>>> >>>>>> >>>>>> One thing of possible note: >>>>>> >>>>>> Failing . . . >>>>>> >>>>>> Host OSVERSION: 1500006 >>>>>> Jail OSVERSION: 1500014 >>>>> >>>>> I have finished a package builder refresh this morning. All our >>>>> builder hosts (except PowerPC - I don't touch those) are now on >>>>> main-n269671-feabaf8d5389 (OSVERSION 1500018). >>>>> >>>>> ampere1 successfully finished its 140releng-armv7-quarterly build, >>>>> so it looks like the problem with stuck builds was limited to >>>>> ampere2 building main-armv7. I'll keep a close eye on this one >>>>> when it starts its next build. >>>>> >>>> >>>> I see that main-armv7 started. >>>> >>>> It queued only 31935 instead of the prior 34528 (or more): it is >>>> doing an >>>> incremental build instead of a full build. For example, pkg was not >>>> built >>>> but instead the prior build is in use. Thus bad results from the >>>> prior >>>> build might be involved in this new build. >>>> >>>> I'd recommend forcing a full "poudriere bulk -c -a" that does a >>>> from-scratch >>>> build for the purposes of the main-armv7 test. >>> >>> Actually the test is not going to previde the information we are >>> after as things are. >>> >>> giflib-5.2.2 failed to build, which leads to devel/doxygen being >>> skipped. devel/doxygen was the first one to hang up in the prior >>> 2 failing attempts, if I remember right. >>> >>> giflib-5.2.2 also causes graphics/graphviz to be skipped. >>> graphics/graphviz was installed just before the hangup in all of >>> the example hanups. So the context will not be replicated. >>> >>> We need graphics/giflib to build to actually do the test. >> >> Looks like: >> >> https://cgit.freebsd.org/ports/commit/graphics/giflib?id=5007109903fc271e3ef0ba01d78781c1fed99f3f >> >> is the fix for the graphic/giflib build failure. > > Well, main-armv7 is building again and things are still > getting stuck. So much for my idea. For reference I > list the over 10-hr-so-far ones: > > doxygen-1.9.6_1,2 build-depends 13:03:54 > py39-pydot-2.0.0 run-depends 12:24:04 > py39-pygraphviz-1.6 lib-depends 12:10:38 > > "ps -alxdww" would likely be appropriate to get a copy > of the otuput of. > > "procstat -k -k" usage and the like on stuck processes > would probably be appropriate. > > Does anyone with appropriate investigative background > have login access to ampere2 to take a look at what > is getting stuck? This is unfortunate. I'm sure I have the appropriate background, but I'm spread very thin! I'll get as much information as I can about this machine while it's stuck, before I bounce it again. I think it may be worth a try building those ports in isolation on ref14-aarch64, and see what they're trying to do. I'll also set up a set of refX-armv7 jails on that machine. Hopefully we can get to the bottom of this soon. This is a very tedious failure mode. We could also try to put an older armv7 image on the builder jail on ampere2. Depending on whether we have a sufficiently old image, that will either be very straightforward, or a very deep rabbit hole. Thanks again for keeping an eye on this. We really should have better monitoring for stuck builds than "Mark will tell us". :-) Philip