Re: Too many pythons in poudriere

From: Mark Millard via freebsd-ports <freebsd-ports_at_freebsd.org>
Date: Fri, 09 Jul 2021 19:46:47 UTC
On 2021-Jul-8, at 14:11, Mark Millard <marklmi at yahoo.com> wrote:

> On 2021-Jul-8, at 10:54, bob prohaska <fbsd at www.zefox.net> wrote:
> 
>> On Thu, Jul 08, 2021 at 09:41:13AM -0700, Mark Millard wrote:
>>> 
>>> 
>>> On 2021-Jul-8, at 08:45, bob prohaska <fbsd at www.zefox.net> wrote:
>>> 
>>>> Even with -J1 and no ALLOW_MAKE_JOBS I'm still
>>>> seeing five pythons occupying at least 3 GB on
>>>> the loose.
>>> 
>>> Actually I just looked and saw:
>>> 
>>> Swapinfo 7.36%
>>> 
>>> (Unlike the 83% or so I saw somewhat around 3 hours ago.)
>>> 
>>> Load Averages (220%) 2.20 2.18 1.76
>>> 
>>> Elapsed 12:54:56
>>> 
>>> I do not see a swaplog in http://www.zefox.org/~bob/swaplogs/
>>> to look at. So I can not see how much the peak swap space
>>> usage was so far (approximately).
>> 
>> Started a new swaplog:
>> http://www.zefox.org/~bob/swaplogs/202107182930.log
>> 
>> It came within a whisker of running out of swap, then abruptly
>> the python threads vanished and the build seems to be proceeding.
> 
> Did you update any thing, such as /usr/ports/ , between the 50+ hr
> run and the new one? The new one spent over 6 hours at:
> 
> [05:50:59] [  8% 3914/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/make_names.py gen/third_party/blink/renderer/modules/event_interface_modules_names.json5 --output_dir gen/third_party/blink/renderer/modules
> [12:22:32] [  8% 3915/47953] python ../../third_party/blink/renderer/bindings/scripts/generate_bindings.py --web_idl_database gen/third_party/blink/renderer/bindings/web_idl_database.pickle --root_src_dir ../../ --root_gen_dir gen --output_core_reldir third_party/blink/renderer/bindings/core/v8/ --output_modules_reldir third_party/blink/renderer/bindings/modules/v8/ enumeration callback_function callback_interface interface namespace typedef union
> [12:22:36] [  8% 3916/47953] touch obj/third_party/blink/renderer/bindings/generate_bindings_all.stamp
> [12:22:39] [  8% 3917/47953] touch obj/third_party/blink/renderer/bindings/modules/event_modules_names.stamp
> [12:22:42] [  8% 3918/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/make_names.py ../../third_party/blink/renderer/modules/event_target_modules_names.json5 --output_dir gen/third_party/blink/renderer/modules
> [12:22:42] [  8% 3919/47953] touch obj/third_party/blink/renderer/bindings/modules/event_target_modules_names.stamp
> 
> 
> The 50+ hour one did not:
> 
> [03:56:05] [  8% 3848/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/make_names.py gen/third_party/blink/renderer/modules/event_interface_modules_names.json5 --output_dir gen/third_party/blink/renderer/modules
> [03:56:05] [  8% 3849/47953] touch obj/third_party/blink/renderer/bindings/modules/event_modules_names.stamp
> [03:56:06] [  8% 3850/47953] python ../../third_party/blink/renderer/bindings/scripts/collect_idl_files.py --idl_list_file __third_party_blink_renderer_bindings_web_idl_in_core_for_testing___build_toolchain_linux_clang_arm64__rule.rsp --component core --output gen/third_party/blink/renderer/bindings/web_idl_in_core_for_testing.pickle --for_testing
> [03:56:06] [  8% 3851/47953] touch obj/third_party/blink/renderer/bindings/web_idl_in_core_for_testing.stamp
> [03:56:06] [  8% 3852/47953] touch obj/third_party/blink/renderer/bindings/scripts/cached_jinja_templates.stamp
> [03:56:06] [  8% 3853/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/make_names.py ../../third_party/blink/renderer/modules/event_target_modules_names.json5 --output_dir gen/third_party/blink/renderer/modules
> [03:56:09] [  8% 3854/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/core/style/make_computed_style_initial_values.py ../../third_party/blink/renderer/core/css/css_properties.json5 ../../third_party/blink/renderer/core/css/computed_style_field_aliases.json5 ../../third_party/blink/renderer/platform/runtime_enabled_features.json5 ../../third_party/blink/renderer/core/style/computed_style_extra_fields.json5 --output_dir gen/third_party/blink/renderer/core/style --gperf gperf
> [03:56:09] [  8% 3855/47953] touch obj/third_party/blink/renderer/bindings/modules/event_target_modules_names.stamp
> 
> 
> The build step numbers are different for the same
> command:
> 
> 3914/47953
> vs.
> 3848/47953
> 
> (But I do not know if the build technique tries to
> keep the partial ordering for build steps stable
> across build attempts from the similar starting
> conditions.)
> 
> It almost seems like like a system level check in what
> process(s) have large amounts swap space in use would
> be appropriate when this sort of thing happens, not
> that I've done such before. My understanding is that
> top's per-process reporting of swap space usage is
> problematical when the display is set to display such
> information.

Going in different direction, when you updated:

/usr/local/poudriere/poudriere-system

to have Jail OSVERSION: 1400024 (and probably
matching the host OS main-n247590-5dd84e315a9),
matching /usr/src/ as well, did you rebuild all
your ports under the coherent combination?

Or are you still using port builds in poudriere that
were built with the jail OSVERSION being older than
the /usr/src/ the builds were based on?

Just like devel/llvm10 had a command that only-sometimes
had problems when built via the incoherent combination,
other ports could have odd behavior --but only sometimes.

My guess is that you have not done such because it would
have taken far more time than I've noticed in your
activities.

I recommended before doing such a rebuild of the ports
and then a reinstall of them. Given the odd behaviors
being observed that do not repeat each time tried, I
still make the recommendation.

>> I'm curious if this was blind luck, or some adaptive behavior
>> by poudrirere.
> 
> Poudriere does not control chrome's internel build steps.
> The notation like [  8% 3849/47953] is not from poudriere.
> 
>> One other oddity: occasionally one see in top a
>> PID using more than 100% WCPU. Is one thread occupying two cores?
> 
> A process can have multiple threads instead of just one
> but each thread runs on at most one cpu (core here) at a
> time.
> 
> Top can do either of:
> 
> A) show a count of threads in each process shown
> vs.
> B) show a line for each thread instead of for each
>   process
> 
> Top can also display the unique thread number instead
> of the process number (shared by all threads in the
> process). (But not both at the same time.) Showing
> thread numbers is probably only commonly selected when
> (B) is in use.
> 
> In (B) mode, if the process number is being shown,
> then there will be one example of the number per
> thread shown that is from the process.
> 
> But I'll note that, if I remember right, python is
> a single threaded language. It would probably take
> use of language bindings to other langauges for
> multiple threads to be involved (indirectly) in a
> python process.
> 
>>> 
>>>> I'm fairly sure this didn't happen
>>>> when using make by itself (IIRC it was -j2).
> 
> It did not happen at that point in the 50+ hr run
> either.
> 
>>>> I also got rid of the mistaken directive in
>>>> poudriere.d/make.conf.
>>> 
>>> When I look at http://www.zefox.org/~bob/poudriere.d/make.conf
>>> now I see:
>>> 
>>> ALLOW_MAKE_JOBS=yes
>>> #MAKE_JOBS_NUMBER=2
>>> #.if ${.CURDIR:M*www/chromium}
>>> #MAKE_JOBS_NUMBER_LIMIT=2
>>> #.endif
>>> #.if ${.CURDIR:M*databases/sqlite3}
>>> #MAKE_JOBS_NUMBER_LIMIT=2
>>> #.endif
>>> #.if ${.CURDIR:M*www/firefox}
>>> #MAKE_JOBS_NUMBER_LIMIT=2
>>> #.endif
>>> 
>>> which does not match your wording.
>>> 
>> 
>> Thank you for catching my error. _now_ it's fixed.
>> 
>> [snip]
>>> To see what is getting CPU time that leads to
>>> the load averages being around 2 might take
>>> using something like top sorted by cpu time
>>> and watching for a while.
>>> 
>>>> There is a 
>>>> #MAX_MEMORY=8
>>>> in poudriere.conf, presumably GB.
>>> 
>>> Documented as GiB:
>>> 
>>> # How much memory to limit jail processes to for *each builder*
>>> # in GiB (default: none)
>>> #MAX_MEMORY=8
>>> 
>>> Per builder, not per-make-process.
>>> Within a builder each make-process shares
>>> that size space with the others.
>>> 
>>>> That
>>>> looks like a good knob to play with. Would
>>>> setting it to something like 3 or 4 help?
>>> 
>>> If the memory use exceeds what you set, the builder
>>> process is likely killed. 
>> [snip] 
>> 
>> I was hopeful it might inhibit starting new PIDs
>> when memory/swap is below some threshold. Guess not. 
> 
> poudriere does not control the internals of the
> builder's attempted operations beyond what the
> port supports. (And there are not port-specific
> interfaces for such control. poudriere treats
> ports the same way in general.)
> 



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)