From nobody Thu Mar 30 18:39:56 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PnXKT0MVXz429Qj for ; Thu, 30 Mar 2023 18:40:13 +0000 (UTC) (envelope-from kevin.bowling@kev009.com) Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PnXKR3Mp1z3G1q for ; Thu, 30 Mar 2023 18:40:11 +0000 (UTC) (envelope-from kevin.bowling@kev009.com) Authentication-Results: mx1.freebsd.org; dkim=none ("invalid DKIM record") header.d=kev009.com header.s=google header.b=nnwButA3; spf=pass (mx1.freebsd.org: domain of kevin.bowling@kev009.com designates 2607:f8b0:4864:20::436 as permitted sender) smtp.mailfrom=kevin.bowling@kev009.com; dmarc=none Received: by mail-pf1-x436.google.com with SMTP id u38so13198241pfg.10 for ; Thu, 30 Mar 2023 11:40:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kev009.com; s=google; t=1680201610; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=T1GGLY+qy7DEJ9++krIcd5sz7q6HLP+656LY0VbabRQ=; b=nnwButA37V7H2RN/4QcY9mEI/nlZcANCbCtejqFHWOZPcnLme3iqsoYPM/ugB0zdFi Triz0xTvK0Dgsop+Mth9BWBAuolHQn3uxDt3j9bfXOm2ieXHPYFQ1TZvq5zEXuze7LYL KVZP/9PJzZBOpSugUjOxrocm/Tl4gQuIoEmsc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680201610; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T1GGLY+qy7DEJ9++krIcd5sz7q6HLP+656LY0VbabRQ=; b=zOgzFn8GN8Bx2AlGUUSF7lg8gefnspghyMg4CSDPp+TeRyXVOlzfwNghfXBQTmEMgr lKblFS+heeEM1n7Om6lOTh7e/67JH+mXQNMUHye+mpKOjjO+oBIFzPzVUZ17kRGF8+zg VtEzJ7Qb09IbBhIsclkGBeKa/nIs3LvITE6zVpN3TSGRJfQ4XspDzf6gMMO47Mz+5hwj YU6HsKEIweNp1UE6KiNesrZ63UCSQOWVceh93X98TRq9MJspPt0qLNm2oQ7MWINeSE8I nOv5NYhBdoQhupvO7FbYEann/MKWVA6pzGOMhgSdHEX4ASX/Y6+zDjaE/pNsgZr6rqkk KSLw== X-Gm-Message-State: AAQBX9eVQvpfChToJRrXzp/CrGRy6lwm2706m2vpGwDRGxtdiGxboyQn bOuIZC/Zt8p0E1ZjwmaRWeihJlpoZF798uPEb0i3g32BCoOXPef9 X-Google-Smtp-Source: AKy350ZaGycRCl810GF8hbu6u2Ru+InfRBU3VYCLVvZFlmBg2FO+sFrd6QoUUSufsno5jJPR65LSQTm63aDxBivYv4s= X-Received: by 2002:a63:4d09:0:b0:503:25af:f50d with SMTP id a9-20020a634d09000000b0050325aff50dmr6444345pgb.4.1680201610044; Thu, 30 Mar 2023 11:40:10 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: <8173cc7e-e934-dd5c-312a-1dfa886941aa@FreeBSD.org> <8cfdb951-9b1f-ecd3-2291-7a528e1b042c@m5p.com> In-Reply-To: From: Kevin Bowling Date: Thu, 30 Mar 2023 11:39:56 -0700 Message-ID: Subject: Re: Periodic rant about SCHED_ULE To: Mateusz Guzik Cc: freebsd-hackers@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [-2.70 / 15.00]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-0.98)[-0.981]; NEURAL_HAM_LONG(-0.42)[-0.423]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MIME_GOOD(-0.10)[text/plain]; MLMMJ_DEST(0.00)[freebsd-hackers@freebsd.org]; R_DKIM_PERMFAIL(0.00)[kev009.com:s=google]; FREEMAIL_TO(0.00)[gmail.com]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::436:from]; DKIM_TRACE(0.00)[kev009.com:~]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; TO_DN_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DMARC_NA(0.00)[kev009.com]; RCVD_COUNT_TWO(0.00)[2] X-Rspamd-Queue-Id: 4PnXKR3Mp1z3G1q X-Spamd-Bar: -- X-ThisMailContainsUnwantedMimeParts: N On Thu, Mar 30, 2023 at 11:29=E2=80=AFAM Kevin Bowling wrote: > > On Thu, Mar 30, 2023 at 8:37=E2=80=AFAM Mateusz Guzik = wrote: > > > > I looked into it a little more, below you can find summary and steps fo= rward. > > > > First a general statement: while ULE does have performance bugs, it > > has better basis than 4BSD to make scheduling decisions. Most notably > > it understands CPU topology, at least for cases which don't involve > > big.LITTLE. For any non-freak case where 4BSD performs better, it is a > > bug in ULE if this is for any reason other than a tradeoff which can > > be tweaked to line them up. Or more to the point, there should not be > > any legitimate reason to use 4BSD these days and modulo the bugs > > below, you are probably losing on performance for doing so. > > An elided simple algorithm for big.LITTLE, from Larry McVoy.. if you > run for an entire quantum, flag preference for big core. If you run > for less or get punted off, flag for little core preference. > > > Bugs reported in this thread by others and confirmed by me: > > 1. failure to load-balance when having n CPUs and n + 1 workers -- the > > excess one stays on one the same CPU thread continuously penalizing > > the same victim. as a result total real time to execute a finite > > computation is longer than in the case of 4BSD > > 2. unfairness of nice -n 20 threads vs threads going frequently off > > CPU (e.g., due to I/O) -- after using only a fraction of the slice the > > victim has to wait for the cpu hog to use up its entire slice, rinse > > and repeat. This extends a 7+ minute buildkernel to over 67 minutes, > > not an issue on 4BSD > > > > I did not put almost any effort into investigating no 1. There is code > > which is supposed to rebalance load across CPUs, someone(tm) will have > > to sit through it -- for all I know the fix is trivial. > > > > Fixing number 2 makes *another* bug more acute and it complicates the > > whole ordeal. > > > > Thus, bug reported by me: > > 3. interactivity scoring is bogus -- originally introduced to detect > > "interactive" behavior by equating being off CPU with waiting for user > > input. One part of the problem is that it puts *all* non-preempted off > > CPU time into one bag: a voluntary sleep. This includes suffering from > > lock contention in the kernel, lock contention in the program itself, > > file I/O and so on, none of which has bearing on how interactive or > > not the program might happen to be. A bigger part of the problem is > > that at least today, the graphical programs don't even act this way to > > begin with -- they stay on CPU *a lot*. > > > > I asked people to provide me with the output of: dtrace -n > > 'sched:::on-cpu { @[execname] =3D lquantize(curthread->td_priority, 0, > > 224, 1); }' from their laptops/desktops. > > > > One finding is that most people (at least those who reported) use firef= ox. > > > > Another finding is that the browser is above the threshold which would > > be considered "interactive" for vast majority of the time in all > > reported cases. > > > > I booted a 2 thread vm with xfce and decided to click around. Spawned > > firefox, opened a file manager (Thunar) and from there I opened a > > movie to play with mpv. As root I spawned make -j 2 buildkernel. it > > was not particularly good :) > > > > I found that mpv spawns a bunch of threads, most notably 2 distinct > > threads for audio and video output. The one for video got a priority > > of 175, while the rest had either 88 or 89 -- the lowest for > > timesharing not considered interactive [note lower is considered > > better]. > > > > At the same time the file manager who was left in the background kept > > doing evil syscall usage, which as a result bouncing between a regular > > timesharing priority and one which made it "interactive", even though > > the program was not touched for minutes. > > > > Or to put it differently, the scheduler failed to recognize that mpv > > is the program to prioritize, all while thinking the background time > > waster is the thing to look after (so to speak). > > > > This brings us to fixing problem 2: currently, due to the existence of > > said problem, the interactivity scoring woes are less acute -- the > > venerable make -j example is struggling to get CPU time, as a result > > messing with real interactive programs to a lesser extent. If that > > gets fixed, we are in a different boat altogether. > > > > I don't see a clean solution. One other random anecdote. Windows 11 uses window focus to highly boost scheduling priority in an obviously effective way. I have no idea how difficult something like that would be to fit into the unix world. > > Right now I'm toying with the idea of either: > > 1. having programs explicitly tell the kernel they are interactive > > 2. adding a scheduler hook to /dev/dsp -- the observation is that if a > > program is producing sound it probably should get some cpu time in a > > timely manner. this would cover audio/video players and web browsers, > > but would not cover other programs (say a pdf reader). it may be it is > > good enough though > > > > -- > > Mateusz Guzik > >