From nobody Fri Mar 31 19:43:10 2023
X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Pp9lg3JhKz439hQ
	for <freebsd-hackers@mlmmj.nyi.freebsd.org>; Fri, 31 Mar 2023 19:46:39 +0000 (UTC)
	(envelope-from jroberson@jroberson.net)
Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4Pp9lf2xjCz3M7s
	for <freebsd-hackers@freebsd.org>; Fri, 31 Mar 2023 19:46:38 +0000 (UTC)
	(envelope-from jroberson@jroberson.net)
Authentication-Results: mx1.freebsd.org;
	dkim=pass header.d=jroberson-net.20210112.gappssmtp.com header.s=20210112 header.b=WpVaJvV5;
	spf=none (mx1.freebsd.org: domain of jroberson@jroberson.net has no SPF policy when checking 2607:f8b0:4864:20::102f) smtp.mailfrom=jroberson@jroberson.net;
	dmarc=none
Received: by mail-pj1-x102f.google.com with SMTP id gp15-20020a17090adf0f00b0023d1bbd9f9eso26596733pjb.0
        for <freebsd-hackers@freebsd.org>; Fri, 31 Mar 2023 12:46:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=jroberson-net.20210112.gappssmtp.com; s=20210112; t=1680291997;
        h=mime-version:message-id:subject:to:from:date:from:to:cc:subject
         :date:message-id:reply-to;
        bh=k53qO8umfeBH4v8lZN0+HGQHHHDfUEtknMpgd0FgH74=;
        b=WpVaJvV5hjcf3Wd59xj3b6QLMcaEoZNh08IHqLPzgRvLv99rEoRBF/u9aGKyjIMJyM
         Ju9wXXX5nqz2aVag8SZVggKQ3hLeFjqnPPcRjN1iT8tdhDDRkqSLiMFlgwOLOXeAUsqJ
         DeSZL2+7tlu3be+UN/fCEAOfggwHP+qiHKnFIW+lhov76w1sVr4HmFokR+LYQZUW5wsp
         meFuBOx7BPQq3GMeDAdfS4wgdkDiSxj+XVD0Mgvg40zCfFsoduPT4qg5/kjLvGr6CwhQ
         YlsMMsuyslSkbkhJPYrtvaRvsvHmy/WHsA/gJpQ4dIS0uUBnnLahoFkggFtaVjLqz83y
         kN/g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1680291997;
        h=mime-version:message-id:subject:to:from:date:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=k53qO8umfeBH4v8lZN0+HGQHHHDfUEtknMpgd0FgH74=;
        b=VqitryxnqJoazUIbbbfVGJbgqdjo3xNxboCJontUUeF/AX7osZJb6sbjr4Xei8X4sp
         yRw6WvlBdkVy0OwwNJqifLI1r6fvl3PMGiOlyYzRTc1mbr6XjbYgz7XcejtARcLenwIp
         8YapeydUlc0WBFDTDZJrpiekZhsO+8siN+EORajP7eX02lVXIyzjW9CXKklXEW6Oc48n
         +6sKDpiT1D/st9WOlULpksAi+DICfT4tvetQacjM1HeYkJArpPRI7p3oW2RVdhsdTZ89
         IqLuy9+JvvUGKUuFileJjOiXHHg+x1i1HkZ59tgmCw32T1tDuvvJMwqXvls71SBuw45r
         3wGA==
X-Gm-Message-State: AAQBX9c/8WndwuL3rJ6zCR2kcnzUUKUbugsZ/xs7n3n/SG1JZKaWtf55
	w0E2wnaZffDslmuHDe02pL5UOaTBpxMxGnHLSbI=
X-Google-Smtp-Source: AKy350b7icUjlAepufr/zHTd0eqewIc0CXrRIK3JRU7KpseHWLIUTc+A8j/dGr5cxJseTlQ4Pp9arg==
X-Received: by 2002:a17:902:fb85:b0:19e:b088:5900 with SMTP id lg5-20020a170902fb8500b0019eb0885900mr24209553plb.38.1680291997030;
        Fri, 31 Mar 2023 12:46:37 -0700 (PDT)
Received: from [192.168.0.31] (c-98-246-66-2.hsd1.wa.comcast.net. [98.246.66.2])
        by smtp.gmail.com with ESMTPSA id u5-20020a656705000000b00502e6bfedc0sm2030711pgf.0.2023.03.31.12.46.35
        for <freebsd-hackers@freebsd.org>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 31 Mar 2023 12:46:36 -0700 (PDT)
Date: Fri, 31 Mar 2023 12:43:10 -0700 (PDT)
From: Jeff Roberson <jroberson@jroberson.net>
To: freebsd-hackers@freebsd.org
Subject: ULE process to resolution
Message-ID: <a6066590-0b4d-b332-102a-9c2432cdfec6@jroberson.net>
List-Id: Technical discussions relating to FreeBSD <freebsd-hackers.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-hackers
List-Help: <mailto:freebsd-hackers+help@freebsd.org>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Subscribe: <mailto:freebsd-hackers+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-hackers+unsubscribe@freebsd.org>
Sender: owner-freebsd-hackers@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset=US-ASCII
X-Spamd-Result: default: False [-3.30 / 15.00];
	NEURAL_HAM_MEDIUM(-1.00)[-1.000];
	NEURAL_HAM_LONG(-1.00)[-1.000];
	NEURAL_HAM_SHORT(-1.00)[-1.000];
	R_DKIM_ALLOW(-0.20)[jroberson-net.20210112.gappssmtp.com:s=20210112];
	MIME_GOOD(-0.10)[text/plain];
	R_SPF_NA(0.00)[no SPF record];
	RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::102f:from];
	MLMMJ_DEST(0.00)[freebsd-hackers@freebsd.org];
	RCVD_VIA_SMTP_AUTH(0.00)[];
	RCVD_TLS_LAST(0.00)[];
	FROM_EQ_ENVFROM(0.00)[];
	MIME_TRACE(0.00)[0:+];
	DKIM_TRACE(0.00)[jroberson-net.20210112.gappssmtp.com:+];
	RCVD_COUNT_THREE(0.00)[3];
	ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US];
	FROM_HAS_DN(0.00)[];
	ARC_NA(0.00)[];
	DMARC_NA(0.00)[jroberson.net];
	TO_MATCH_ENVRCPT_ALL(0.00)[];
	TO_DN_NONE(0.00)[];
	PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org];
	RCPT_COUNT_ONE(0.00)[1];
	MID_RHS_MATCH_FROM(0.00)[]
X-Rspamd-Queue-Id: 4Pp9lf2xjCz3M7s
X-Spamd-Bar: ---
X-ThisMailContainsUnwantedMimeParts: N

Hi Folks,

For those who don't know, I am the original author of ULE.  I have not had 
much time for FreeBSD in recent years but this thread was forwarded to me 
and I am dishearetened at the state of things.  I will give my perspective 
and propose a path to resolve this systematically.

The fundamental benefit of ULE is also the fundamental challenge, That is: 
N cpu local decisions need to add up to a reasonable approximation of a 
correct global decision.  This is necessary to scale to large core counts, 
large thread counts, and preserve some affinity.  You could permute 4BSD 
further towards these goals but I posit that you would simply have to work 
through the same bugs.

As I read these threads I can state with a high degree of confidence that 
many of these tests worked with superior results with ULE at one time. 
It may be that tradeoffs have changed or exposed weaknesses, it may also 
be that it's simply been broken over time.  I see a large number of 
commits intended to address point issues and wonder whether we adequately 
explored the consquences.  Indeed I see solutions involving tunables 
proposed here that will definitively break other cases.

I know that CPU tradeoffs have changed.  ULE was written in a way that the 
topology could be annotated and cost of migration can be specified.  It is 
adaptable to this but someone has to put in the effort.  The cost function 
was written in ticks which does not scale down properly and accurate cpu 
tick counters could now be used for more precise time-keeping for more 
specific affinity.  Over time people have also added additional searches 
to pickcpu which don't scale well to very high core count systems.  NUMA 
and heterogeneous CPUs are also possible in the graph framework but need 
further investment.

The other thing that has changed over time is the ability of the 
interactivity score to correctly detect truely interactive applications. 
When I wrote it you could do a buildworld on a single core or small 
multi-core system and play mp3s and browse the web without a hiccup. 
However, web browsers have evolved to be significantly more resource 
intensive.  I'm not sure a heuristic can or should catch this case. 
We're probably long overdue to add x window focus hints as most other 
operating systems do.  I don't think tossing the interactivity score is 
really going to produce the desired results.  Linux CFS disagrees with me 
but I have always been able to achieve superior responsiveness with ULE. 
My intuition is that with an x window focus hint we could dial back the 
interactive threshold and have better tradeoffs with the soft real-time 
score.

schedgraph is also no longer adequate for modern systems.  In my 
professional life I have taken the same types of data sources and built 
text based processes on top because graphical representations just can't 
scale to the number of events and cores for full system scheduling.  For 
complex scheduling issues you need detailed introspection.  You're not 
going to tweak variables and run buildworlds to arrive at success by 
supposition with any kind of reasonable velocity.

The first step to resolving this is to come up with a list of regression 
tests and catalog how they behave compared to 4BSD.  When I wrote the 
scheduler I also wrote a simple fixed duty cycle program that could be run 
with different scheduling parameters and report on its cpu usage and 
latency.  Combining many copies of this program you can simulate various 
kinds of interactions.  It is available at 
people.freebsd.org/~jeff/late.tgz.  I know there is also a linux scheduler 
benchmark that may be worth porting.

If someone would start making regression tests I am happy to fix bugs or 
review bug fixes.  Personally I would start from fairness given different 
nice values on a single CPU, and then multi-cpu.  Evaluate allocation with 
variation on load to core count ratios.  It should not take a few hours to 
iterate through the interesting cases here before going on to more complex 
questions about buildworld or firefox etc.  This would need to be 
something we carried forward in the source tree and ask people to re-run 
as part of scheduler CRs or we're just going to find ourselves back in 
this spot again.

I also have a backlog of improvements for large multi-core systems from 
work I did years ago that have not made it into the tree.  And I have an 
old review for patches to improve the reliability of priority in causing 
scheduling events that may be germane.  If we can collaborate on a testing 
framework I could trickle these in.

Thanks,
Jeff