From nobody Fri Mar 31 19:43:10 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Pp9lg3JhKz439hQ for ; Fri, 31 Mar 2023 19:46:39 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Pp9lf2xjCz3M7s for ; Fri, 31 Mar 2023 19:46:38 +0000 (UTC) (envelope-from jroberson@jroberson.net) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=jroberson-net.20210112.gappssmtp.com header.s=20210112 header.b=WpVaJvV5; spf=none (mx1.freebsd.org: domain of jroberson@jroberson.net has no SPF policy when checking 2607:f8b0:4864:20::102f) smtp.mailfrom=jroberson@jroberson.net; dmarc=none Received: by mail-pj1-x102f.google.com with SMTP id gp15-20020a17090adf0f00b0023d1bbd9f9eso26596733pjb.0 for ; Fri, 31 Mar 2023 12:46:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jroberson-net.20210112.gappssmtp.com; s=20210112; t=1680291997; h=mime-version:message-id:subject:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=k53qO8umfeBH4v8lZN0+HGQHHHDfUEtknMpgd0FgH74=; b=WpVaJvV5hjcf3Wd59xj3b6QLMcaEoZNh08IHqLPzgRvLv99rEoRBF/u9aGKyjIMJyM Ju9wXXX5nqz2aVag8SZVggKQ3hLeFjqnPPcRjN1iT8tdhDDRkqSLiMFlgwOLOXeAUsqJ DeSZL2+7tlu3be+UN/fCEAOfggwHP+qiHKnFIW+lhov76w1sVr4HmFokR+LYQZUW5wsp meFuBOx7BPQq3GMeDAdfS4wgdkDiSxj+XVD0Mgvg40zCfFsoduPT4qg5/kjLvGr6CwhQ YlsMMsuyslSkbkhJPYrtvaRvsvHmy/WHsA/gJpQ4dIS0uUBnnLahoFkggFtaVjLqz83y kN/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680291997; h=mime-version:message-id:subject:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=k53qO8umfeBH4v8lZN0+HGQHHHDfUEtknMpgd0FgH74=; b=VqitryxnqJoazUIbbbfVGJbgqdjo3xNxboCJontUUeF/AX7osZJb6sbjr4Xei8X4sp yRw6WvlBdkVy0OwwNJqifLI1r6fvl3PMGiOlyYzRTc1mbr6XjbYgz7XcejtARcLenwIp 8YapeydUlc0WBFDTDZJrpiekZhsO+8siN+EORajP7eX02lVXIyzjW9CXKklXEW6Oc48n +6sKDpiT1D/st9WOlULpksAi+DICfT4tvetQacjM1HeYkJArpPRI7p3oW2RVdhsdTZ89 IqLuy9+JvvUGKUuFileJjOiXHHg+x1i1HkZ59tgmCw32T1tDuvvJMwqXvls71SBuw45r 3wGA== X-Gm-Message-State: AAQBX9c/8WndwuL3rJ6zCR2kcnzUUKUbugsZ/xs7n3n/SG1JZKaWtf55 w0E2wnaZffDslmuHDe02pL5UOaTBpxMxGnHLSbI= X-Google-Smtp-Source: AKy350b7icUjlAepufr/zHTd0eqewIc0CXrRIK3JRU7KpseHWLIUTc+A8j/dGr5cxJseTlQ4Pp9arg== X-Received: by 2002:a17:902:fb85:b0:19e:b088:5900 with SMTP id lg5-20020a170902fb8500b0019eb0885900mr24209553plb.38.1680291997030; Fri, 31 Mar 2023 12:46:37 -0700 (PDT) Received: from [192.168.0.31] (c-98-246-66-2.hsd1.wa.comcast.net. [98.246.66.2]) by smtp.gmail.com with ESMTPSA id u5-20020a656705000000b00502e6bfedc0sm2030711pgf.0.2023.03.31.12.46.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 Mar 2023 12:46:36 -0700 (PDT) Date: Fri, 31 Mar 2023 12:43:10 -0700 (PDT) From: Jeff Roberson To: freebsd-hackers@freebsd.org Subject: ULE process to resolution Message-ID: List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=US-ASCII X-Spamd-Result: default: False [-3.30 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[jroberson-net.20210112.gappssmtp.com:s=20210112]; MIME_GOOD(-0.10)[text/plain]; R_SPF_NA(0.00)[no SPF record]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::102f:from]; MLMMJ_DEST(0.00)[freebsd-hackers@freebsd.org]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_TLS_LAST(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; DKIM_TRACE(0.00)[jroberson-net.20210112.gappssmtp.com:+]; RCVD_COUNT_THREE(0.00)[3]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; DMARC_NA(0.00)[jroberson.net]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Queue-Id: 4Pp9lf2xjCz3M7s X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N Hi Folks, For those who don't know, I am the original author of ULE. I have not had much time for FreeBSD in recent years but this thread was forwarded to me and I am dishearetened at the state of things. I will give my perspective and propose a path to resolve this systematically. The fundamental benefit of ULE is also the fundamental challenge, That is: N cpu local decisions need to add up to a reasonable approximation of a correct global decision. This is necessary to scale to large core counts, large thread counts, and preserve some affinity. You could permute 4BSD further towards these goals but I posit that you would simply have to work through the same bugs. As I read these threads I can state with a high degree of confidence that many of these tests worked with superior results with ULE at one time. It may be that tradeoffs have changed or exposed weaknesses, it may also be that it's simply been broken over time. I see a large number of commits intended to address point issues and wonder whether we adequately explored the consquences. Indeed I see solutions involving tunables proposed here that will definitively break other cases. I know that CPU tradeoffs have changed. ULE was written in a way that the topology could be annotated and cost of migration can be specified. It is adaptable to this but someone has to put in the effort. The cost function was written in ticks which does not scale down properly and accurate cpu tick counters could now be used for more precise time-keeping for more specific affinity. Over time people have also added additional searches to pickcpu which don't scale well to very high core count systems. NUMA and heterogeneous CPUs are also possible in the graph framework but need further investment. The other thing that has changed over time is the ability of the interactivity score to correctly detect truely interactive applications. When I wrote it you could do a buildworld on a single core or small multi-core system and play mp3s and browse the web without a hiccup. However, web browsers have evolved to be significantly more resource intensive. I'm not sure a heuristic can or should catch this case. We're probably long overdue to add x window focus hints as most other operating systems do. I don't think tossing the interactivity score is really going to produce the desired results. Linux CFS disagrees with me but I have always been able to achieve superior responsiveness with ULE. My intuition is that with an x window focus hint we could dial back the interactive threshold and have better tradeoffs with the soft real-time score. schedgraph is also no longer adequate for modern systems. In my professional life I have taken the same types of data sources and built text based processes on top because graphical representations just can't scale to the number of events and cores for full system scheduling. For complex scheduling issues you need detailed introspection. You're not going to tweak variables and run buildworlds to arrive at success by supposition with any kind of reasonable velocity. The first step to resolving this is to come up with a list of regression tests and catalog how they behave compared to 4BSD. When I wrote the scheduler I also wrote a simple fixed duty cycle program that could be run with different scheduling parameters and report on its cpu usage and latency. Combining many copies of this program you can simulate various kinds of interactions. It is available at people.freebsd.org/~jeff/late.tgz. I know there is also a linux scheduler benchmark that may be worth porting. If someone would start making regression tests I am happy to fix bugs or review bug fixes. Personally I would start from fairness given different nice values on a single CPU, and then multi-cpu. Evaluate allocation with variation on load to core count ratios. It should not take a few hours to iterate through the interesting cases here before going on to more complex questions about buildworld or firefox etc. This would need to be something we carried forward in the source tree and ask people to re-run as part of scheduler CRs or we're just going to find ourselves back in this spot again. I also have a backlog of improvements for large multi-core systems from work I did years ago that have not made it into the tree. And I have an old review for patches to improve the reliability of priority in causing scheduling events that may be germane. If we can collaborate on a testing framework I could trickle these in. Thanks, Jeff