From nobody Mon May 01 19:33:56 2023 X-Original-To: dev-commits-src-branches@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Q9D0k5tSmz48w8p; Mon, 1 May 2023 19:33:58 +0000 (UTC) (envelope-from jah@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Q9D0k5L9yz4NQ1; Mon, 1 May 2023 19:33:58 +0000 (UTC) (envelope-from jah@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1682969638; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DzInxjNQtfRHVEkkiG1V4W/0Aq6ZooQr32FG+ULdV2E=; b=ZuqicZ/iJyEBWoE+bf3Ero7zzjveeGef/UtYuvCNgH9OXs0kFZkqzJoUjqg8emI4IMrMOi 0NAC1hIWXSmE7NSi/8YgoRAgw+0QRctDIRSG2DwumNkFaTkogJ2nqcglyRYa6Zk9S9vwi+ 2kTZk0wm8uySwNxfE2QC2K3xG0U9uicdzxxLgGS8s9wyepx4nosh5tRyDBfr7e2m+z5/PQ RvGPYuEbH+CTGvdJHqWSXZhVrZPBBsxrlvsrX1CBP3yKPpIugwkVnlsggYvX5HaCWK+ZZ2 3UFaq4mkm4JmERbcHt65Kv2537rhbCAe3+lzZAwJgpt7tbpc4gNY5ZETNxJIsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1682969638; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DzInxjNQtfRHVEkkiG1V4W/0Aq6ZooQr32FG+ULdV2E=; b=d7wLt4f8k8Gi+tX+a26zyEaXDXMXf0JzBW3Q37hPuZtxsZv4hwt2QHSx6kMPMqBvgKWEBj L5S47Jo63zQq13inp34unRLJql+almEHW+mF1jsivehajOPnlgPny+Nf4To+FbDbCo2dQz dkk/Rpddc6XVpKap2aDg4eEm8wleJH+nQvQ08Fh4YvJw1MQ08dceVNmLrpO0ozlEM2A4L6 4clwLO4lGxD/enxx0E9fk2DFue8gKj+gJGY9QhmHDANmNfXrc7Tk4t/ZlHF9zolvLH1lyU +Bmkxyx7o4LhsIxQ3AoGG/S28aIGzLLvGS6FUZ38WEGeKzSQgJ9v6ByFPQJ3NA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1682969638; a=rsa-sha256; cv=none; b=CyRetMIjwE7G7XyP6hwoZj+K+TE5V7451XRbs541wfH202XPE7jEHI5+dxZQfIBPNKtF7n BPg4OFnWN4zb05V4hYv2RDUm24NzCBkx1uCV6RrVOxx4JzilDFg72k7rwh0luRJ5QKvALo kOWycWD9wQb58d9sVyIitty/srB+Zvb9ughJi3gJsgIDOvUXy5oW9wa9PKkomv8DfM/Bmo 4/bLE5/jVXuwc9uLbbc4gagCH1qmPbzV5KMqMSfy6yLMANnhUKYImrM0Yp6yrBAXMAqODl z4HsNeEtsv5pABXvzDrXaT0KrRO2prrcOpZke8dZIJP9DS3LDwwualyPmcca9Q== Received: from corona (047-232-115-243.res.spectrum.com [47.232.115.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: jah) by smtp.freebsd.org (Postfix) with ESMTPSA id 4Q9D0k1S4SztrW; Mon, 1 May 2023 19:33:58 +0000 (UTC) (envelope-from jah@freebsd.org) Date: Mon, 1 May 2023 14:33:56 -0500 From: "Jason A. Harmening" To: Dimitry Andric Cc: John Baldwin , Konstantin Belousov , src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-branches@freebsd.org Subject: Re: git: 060699e91369 - stable/13 - Merge llvm-project release/15.x llvmorg-15.0.7-0-g8dfdcc7b7bf6 Message-ID: References: <202304092135.339LZMeJ081640@gitrepo.freebsd.org> <76DD2CB9-986B-4349-8F46-3B7BF63EB315@FreeBSD.org> List-Id: Commits to the stable branches of the FreeBSD src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-branches List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-branches@freebsd.org X-BeenThere: dev-commits-src-branches@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ThisMailContainsUnwantedMimeParts: N On Mon, May 01, 2023 at 08:41:32PM +0200, Dimitry Andric wrote: > On 1 May 2023, at 18:14, John Baldwin wrote: > > > > On 4/30/23 8:31 PM, Jason A. Harmening wrote: > >> On Sun, Apr 30, 2023 at 07:34:45PM -0500, Jason A. Harmening wrote: > >>> On Sun, Apr 30, 2023 at 06:47:13PM -0500, Jason A. Harmening wrote: > >>>> On Sun, Apr 30, 2023 at 08:09:16AM +0300, Konstantin Belousov wrote: > >>>>> On Sat, Apr 29, 2023 at 02:27:50PM -0500, Jason A. Harmening wrote: > >>>>>> On Sat, Apr 29, 2023 at 08:49:28PM +0200, Dimitry Andric wrote: > >>>>>>> On 29 Apr 2023, at 20:33, Jason A. Harmening wrote: > >>>>>>>> > >>>>>>>> On Sun, Apr 09, 2023 at 09:35:22PM +0000, Dimitry Andric wrote: > >>>>>>>>> The branch stable/13 has been updated by dim: > >>>>>>>>> > >>>>>>>>> URL: https://cgit.FreeBSD.org/src/commit/?id=060699e9136975d51d3f726b9785bdbac9a62ba6 > >>>>>>>>> > >>>>>>>>> commit 060699e9136975d51d3f726b9785bdbac9a62ba6 > >>>>>>>>> Author: Dimitry Andric > >>>>>>>>> AuthorDate: 2023-01-14 16:33:24 +0000 > >>>>>>>>> Commit: Dimitry Andric > >>>>>>>>> CommitDate: 2023-04-09 14:54:52 +0000 > >>>>>>>>> > >>>>>>>>> Merge llvm-project release/15.x llvmorg-15.0.7-0-g8dfdcc7b7bf6 > >>>>>>>>> > >>>>>>>>> This updates llvm, clang, compiler-rt, libc++, libunwind, lld, lldb and > >>>>>>>>> openmp to llvmorg-15.0.7-0-g8dfdcc7b7bf6. > >>>>>>>>> > >>>>>>>>> PR: 265425 > >>>>>>>>> MFC after: 2 weeks > >>>>>>>> > >>>>>>>> This MFC of llvm15 appears to have completely broken the Intel IOMMU > >>>>>>>> driver on my stable/13 machine. After this series of commits, any > >>>>>>>> downstream DMA seems to produce an IOMMU translation fault, which > >>>>>>>> renders the machine completely unusable: no nvme boot disk, no usb > >>>>>>>> keyboard, etc. > >>>>>>>> > >>>>>>>> The faults I see look something like this: > >>>>>>>> > >>>>>>>> DMAR4: ahci0: pci0:17:5 sid 8d fault acc 0 adt 0x0 reason 0x3 addr 26000 > >>>>>>>> > >>>>>>>> It's a bit surprising to see a toolchain upgrade produce breakage like > >>>>>>>> this, but that's what git bisect clearly tells me. I wonder if some of > >>>>>>>> the IOMMU control structures might be defined as C bitfields and the new > >>>>>>>> compiler is emitting them differently? Also, was any breakage like this > >>>>>>>> observed when -current was upgraded to llvm15 several months ago? > >>>>>>> > >>>>>>> I haven't heard anything about such breakage, no. > >>>>>>> > >>>>>>> > >>>>>>>> More generally, this is the second time in as many months I've had to > >>>>>>>> deal with IOMMU breakage on -stable. I can't imagine I'm the only > >>>>>>>> person who sees value in running with DMA remapping enabled; do we need > >>>>>>>> a dedicated DMAR-enabled machine in the cluster to smoke-test changes > >>>>>>>> like this? More generally, should we avoid MFCing high-risk changes > >>>>>>>> like this? > >>>>>>> > >>>>>>> Since there were very few bug reports, it was not deemed high risk. > >>>>>>> > >>>>>>> In any case, it would be good to get the bottom of what is causing the > >>>>>>> problem, so is there any way you can isolate which code seems to be > >>>>>>> going "bad"? > >>>>>>> > >>>>>>> For example, if this problem affects code in sys/dev/iommu, is there > >>>>>>> some way you can compile that part with -O1, or with an older version > >>>>>>> of clang (from ports), to see if the problem goes away? > >>>>>> > >>>>>> I did try removing all custom make.conf settings (previously I just had > >>>>>> CPUTYPE?=icelake-server), but that didn't change the behavior. > >>>>>> > >>>>>> Before I try further build tweaks, I'd like to ask if the IOMMU fault > >>>>>> report can provide guidance here? AFAICT all the faults I'm getting > >>>>>> show "reason 0x3". If I'm reading the VT-d spec correctly, FR=0x3 > >>>>>> indicates an invalid context entry, in other words there's something the > >>>>>> hardware doesn't like in the way the address width or pagetable base is > >>>>>> configured for the PCIe requestor. > >>>>> > >>>>> I would start looking at the other direction: might be, there are still some > >>>>> left shifts for int32 values with the shift count > 30, or uint32 with the > >>>>> count > 31. > >>>>> > >>>>> Also might be useful to dump each context entry on creation, it is kept > >>>>> constant after. > >>>> > >>>> I did look over the constants in intel_reg.h, and didn't see anything > >>>> that looked as though it would be susceptible to sign-extension or > >>>> truncation bugs. In the failing case it's much easier for me to catch > >>>> the fault messages than any initialization message, so I instrumented > >>>> the fault handler to get the context entry from the dmar_ctx object > >>>> using the same logic as dmar_map_ctx_entry(), and then dump out the ctx1 > >>>> and ctx2 fields. What I see are messages like: > >>>> > >>>> ... ctx1 0x10013b001 ctx2 0x103 > >>>> > >>>> At first glance these "look right": the P bit is set in ctx1, and the > >>>> rest of the field looks like a valid physical address. ctx2 also > >>>> doesn't have any of the reserved bits set, but in all cases it does have > >>>> AW=3, which would indicate 57-bit AGAW. But when I boot the last > >>>> working kernel, from the revision prior to the llvm15 MFC, I see this in > >>>> dmesg: > >>>> > >>>> ahci0: dmar4 pci0:0:17:5 rid 8d domain 1 mgaw 48 agaw 48 re-mapped > >>>> > >>>> ...all reported devices show 48-bit MGAW/AGAW, so I would expect ctx2 to > >>>> have AW=2. I suspect this may be the source of the fault, but I'm not > >>>> sure how it's getting configured that way, whether it's an issue with > >>>> reading the capability register or something else. > >>>> > >>> > >>> I can confirm that hacking domain_set_agaw() to always use the settings > >>> from sagaw_bits[2] eliminates the faults and at least allows the machine > >>> to boot to single-user mode. > >> I see what's happening now. When I added the hack to always set > >> sagaw_bits[2], I noted that the passed-in MGAW was still 57, while > >> unit->hw_cap had the correct value of 0x4 (=> 4-level paging, 48-bit AW) > >> in bits 12:8. The problem is that sagaw_bits has agaw=64 in its last > >> entry. This results in dmar_maxaddr2mgaw() attempting a comparison > >> against 1ULL << 64 in the final iteration of its first loop. I suspect > >> the new compiler probably determines that last iteration is meaningless > >> and simply omits it from the (probably unrolled) loop. Since the "loop" > >> terminates with i < nitems(sagaw_bits), the subsequent "allow_less ..." > >> case doesn't execute and we end up erroneously selecting a 57-bit > >> address width. Just commenting out that last entry in sagaw_bits fixes > >> the problem. > >> So, two questions: > >> 1) Does any VT-d hardware actually support 6-level paging? The ca. 2021 > >> VT-d spec I'm looking at indicates 5-level is the greatest depth > >> supported, with everything above that being reserved. > >> 2) I'd expect clang to try very hard to error out in a situation like > >> this, but I see that sys/conf/kern.mk sets -Wno-shift-count-overflow > >> among other things, and more of them were added for clang 15. This > >> seems like a really bad idea, regardless of how much of a PITA it may be > >> to fix these warnings. > > > > FWIW, I've been working on trying to re-enable some of the warnings that > > were disabled for clang 15 in main. I'll move that one higher up on my > > todo list. > > In this particular case, it doesn't warn about it though. I think KASAN > might be a better 'catcher' for this kind of error, or a KUBSAN, if we > had one... If you've tried turning all the relevant warnings/errors back on for this code, and clang truly doesn't warn about it, then wouldn't this warrant a bug against upstream clang? This is a situation in which the compiler detects a left-shift overflow, by itself at least a warning condition, and uses it to materially alter program flow. > > -Dimitry >