Re: Too many pythons in poudriere

From: Mark Millard via freebsd-ports <freebsd-ports_at_freebsd.org>
Date: Thu, 8 Jul 2021 14:11:43 -0700
On 2021-Jul-8, at 10:54, bob prohaska <fbsd at www.zefox.net> wrote:

> On Thu, Jul 08, 2021 at 09:41:13AM -0700, Mark Millard wrote:
>> 
>> 
>> On 2021-Jul-8, at 08:45, bob prohaska <fbsd at www.zefox.net> wrote:
>> 
>>> Even with -J1 and no ALLOW_MAKE_JOBS I'm still
>>> seeing five pythons occupying at least 3 GB on
>>> the loose.
>> 
>> Actually I just looked and saw:
>> 
>> Swapinfo 7.36%
>> 
>> (Unlike the 83% or so I saw somewhat around 3 hours ago.)
>> 
>> Load Averages (220%) 2.20 2.18 1.76
>> 
>> Elapsed 12:54:56
>> 
>> I do not see a swaplog in http://www.zefox.org/~bob/swaplogs/
>> to look at. So I can not see how much the peak swap space
>> usage was so far (approximately).
> 
> Started a new swaplog:
> http://www.zefox.org/~bob/swaplogs/202107182930.log
> 
> It came within a whisker of running out of swap, then abruptly
> the python threads vanished and the build seems to be proceeding.

Did you update any thing, such as /usr/ports/ , between the 50+ hr
run and the new one? The new one spent over 6 hours at:

[05:50:59] [  8% 3914/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/make_names.py gen/third_party/blink/renderer/modules/event_interface_modules_names.json5 --output_dir gen/third_party/blink/renderer/modules
[12:22:32] [  8% 3915/47953] python ../../third_party/blink/renderer/bindings/scripts/generate_bindings.py --web_idl_database gen/third_party/blink/renderer/bindings/web_idl_database.pickle --root_src_dir ../../ --root_gen_dir gen --output_core_reldir third_party/blink/renderer/bindings/core/v8/ --output_modules_reldir third_party/blink/renderer/bindings/modules/v8/ enumeration callback_function callback_interface interface namespace typedef union
[12:22:36] [  8% 3916/47953] touch obj/third_party/blink/renderer/bindings/generate_bindings_all.stamp
[12:22:39] [  8% 3917/47953] touch obj/third_party/blink/renderer/bindings/modules/event_modules_names.stamp
[12:22:42] [  8% 3918/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/make_names.py ../../third_party/blink/renderer/modules/event_target_modules_names.json5 --output_dir gen/third_party/blink/renderer/modules
[12:22:42] [  8% 3919/47953] touch obj/third_party/blink/renderer/bindings/modules/event_target_modules_names.stamp


The 50+ hour one did not:

[03:56:05] [  8% 3848/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/make_names.py gen/third_party/blink/renderer/modules/event_interface_modules_names.json5 --output_dir gen/third_party/blink/renderer/modules
[03:56:05] [  8% 3849/47953] touch obj/third_party/blink/renderer/bindings/modules/event_modules_names.stamp
[03:56:06] [  8% 3850/47953] python ../../third_party/blink/renderer/bindings/scripts/collect_idl_files.py --idl_list_file __third_party_blink_renderer_bindings_web_idl_in_core_for_testing___build_toolchain_linux_clang_arm64__rule.rsp --component core --output gen/third_party/blink/renderer/bindings/web_idl_in_core_for_testing.pickle --for_testing
[03:56:06] [  8% 3851/47953] touch obj/third_party/blink/renderer/bindings/web_idl_in_core_for_testing.stamp
[03:56:06] [  8% 3852/47953] touch obj/third_party/blink/renderer/bindings/scripts/cached_jinja_templates.stamp
[03:56:06] [  8% 3853/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/make_names.py ../../third_party/blink/renderer/modules/event_target_modules_names.json5 --output_dir gen/third_party/blink/renderer/modules
[03:56:09] [  8% 3854/47953] python ../../third_party/blink/renderer/build/scripts/run_with_pythonpath.py -I ../../third_party/blink/renderer/build/scripts -I ../../third_party -I ../../tools ../../third_party/blink/renderer/build/scripts/core/style/make_computed_style_initial_values.py ../../third_party/blink/renderer/core/css/css_properties.json5 ../../third_party/blink/renderer/core/css/computed_style_field_aliases.json5 ../../third_party/blink/renderer/platform/runtime_enabled_features.json5 ../../third_party/blink/renderer/core/style/computed_style_extra_fields.json5 --output_dir gen/third_party/blink/renderer/core/style --gperf gperf
[03:56:09] [  8% 3855/47953] touch obj/third_party/blink/renderer/bindings/modules/event_target_modules_names.stamp


The build step numbers are different for the same
command:

3914/47953
vs.
3848/47953

(But I do not know if the build technique tries to
keep the partial ordering for build steps stable
across build attempts from the similar starting
conditions.)

It almost seems like like a system level check in what
process(s) have large amounts swap space in use would
be appropriate when this sort of thing happens, not
that I've done such before. My understanding is that
top's per-process reporting of swap space usage is
problematical when the display is set to display such
information.

> I'm curious if this was blind luck, or some adaptive behavior
> by poudrirere.

Poudriere does not control chrome's internel build steps.
The notation like [  8% 3849/47953] is not from poudriere.

> One other oddity: occasionally one see in top a
> PID using more than 100% WCPU. Is one thread occupying two cores?

A process can have multiple threads instead of just one
but each thread runs on at most one cpu (core here) at a
time.

Top can do either of:

A) show a count of threads in each process shown
vs.
B) show a line for each thread instead of for each
   process

Top can also display the unique thread number instead
of the process number (shared by all threads in the
process). (But not both at the same time.) Showing
thread numbers is probably only commonly selected when
(B) is in use.

In (B) mode, if the process number is being shown,
then there will be one example of the number per
thread shown that is from the process.

But I'll note that, if I remember right, python is
a single threaded language. It would probably take
use of language bindings to other langauges for
multiple threads to be involved (indirectly) in a
python process.

>> 
>>> I'm fairly sure this didn't happen
>>> when using make by itself (IIRC it was -j2).

It did not happen at that point in the 50+ hr run
either.

>>> I also got rid of the mistaken directive in
>>> poudriere.d/make.conf.
>> 
>> When I look at http://www.zefox.org/~bob/poudriere.d/make.conf
>> now I see:
>> 
>> ALLOW_MAKE_JOBS=yes
>> #MAKE_JOBS_NUMBER=2
>> #.if ${.CURDIR:M*www/chromium}
>> #MAKE_JOBS_NUMBER_LIMIT=2
>> #.endif
>> #.if ${.CURDIR:M*databases/sqlite3}
>> #MAKE_JOBS_NUMBER_LIMIT=2
>> #.endif
>> #.if ${.CURDIR:M*www/firefox}
>> #MAKE_JOBS_NUMBER_LIMIT=2
>> #.endif
>> 
>> which does not match your wording.
>> 
> 
> Thank you for catching my error. _now_ it's fixed.
> 
> [snip]
>> To see what is getting CPU time that leads to
>> the load averages being around 2 might take
>> using something like top sorted by cpu time
>> and watching for a while.
>> 
>>> There is a 
>>> #MAX_MEMORY=8
>>> in poudriere.conf, presumably GB.
>> 
>> Documented as GiB:
>> 
>> # How much memory to limit jail processes to for *each builder*
>> # in GiB (default: none)
>> #MAX_MEMORY=8
>> 
>> Per builder, not per-make-process.
>> Within a builder each make-process shares
>> that size space with the others.
>> 
>>> That
>>> looks like a good knob to play with. Would
>>> setting it to something like 3 or 4 help?
>> 
>> If the memory use exceeds what you set, the builder
>> process is likely killed. 
> [snip] 
> 
> I was hopeful it might inhibit starting new PIDs
> when memory/swap is below some threshold. Guess not. 

poudriere does not control the internals of the
builder's attempted operations beyond what the
port supports. (And there are not port-specific
interfaces for such control. poudriere treats
ports the same way in general.)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Received on Thu Jul 08 2021 - 21:11:43 UTC

Original text of this message