Re: git: 060699e91369 - stable/13 - Merge llvm-project release/15.x llvmorg-15.0.7-0-g8dfdcc7b7bf6

From: Jason A. Harmening <jah_at_freebsd.org>
Date: Mon, 01 May 2023 19:33:56 UTC
On Mon, May 01, 2023 at 08:41:32PM +0200, Dimitry Andric wrote:
> On 1 May 2023, at 18:14, John Baldwin <jhb@freebsd.org> wrote:
> > 
> > On 4/30/23 8:31 PM, Jason A. Harmening wrote:
> >> On Sun, Apr 30, 2023 at 07:34:45PM -0500, Jason A. Harmening wrote:
> >>> On Sun, Apr 30, 2023 at 06:47:13PM -0500, Jason A. Harmening wrote:
> >>>> On Sun, Apr 30, 2023 at 08:09:16AM +0300, Konstantin Belousov wrote:
> >>>>> On Sat, Apr 29, 2023 at 02:27:50PM -0500, Jason A. Harmening wrote:
> >>>>>> On Sat, Apr 29, 2023 at 08:49:28PM +0200, Dimitry Andric wrote:
> >>>>>>> On 29 Apr 2023, at 20:33, Jason A. Harmening <jah@FreeBSD.org> wrote:
> >>>>>>>> 
> >>>>>>>> On Sun, Apr 09, 2023 at 09:35:22PM +0000, Dimitry Andric wrote:
> >>>>>>>>> The branch stable/13 has been updated by dim:
> >>>>>>>>> 
> >>>>>>>>> URL: https://cgit.FreeBSD.org/src/commit/?id=060699e9136975d51d3f726b9785bdbac9a62ba6
> >>>>>>>>> 
> >>>>>>>>> commit 060699e9136975d51d3f726b9785bdbac9a62ba6
> >>>>>>>>> Author:     Dimitry Andric <dim@FreeBSD.org>
> >>>>>>>>> AuthorDate: 2023-01-14 16:33:24 +0000
> >>>>>>>>> Commit:     Dimitry Andric <dim@FreeBSD.org>
> >>>>>>>>> CommitDate: 2023-04-09 14:54:52 +0000
> >>>>>>>>> 
> >>>>>>>>>    Merge llvm-project release/15.x llvmorg-15.0.7-0-g8dfdcc7b7bf6
> >>>>>>>>> 
> >>>>>>>>>    This updates llvm, clang, compiler-rt, libc++, libunwind, lld, lldb and
> >>>>>>>>>    openmp to llvmorg-15.0.7-0-g8dfdcc7b7bf6.
> >>>>>>>>> 
> >>>>>>>>>    PR:             265425
> >>>>>>>>>    MFC after:      2 weeks
> >>>>>>>> 
> >>>>>>>> This MFC of llvm15 appears to have completely broken the Intel IOMMU
> >>>>>>>> driver on my stable/13 machine.  After this series of commits, any
> >>>>>>>> downstream DMA seems to produce an IOMMU translation fault, which
> >>>>>>>> renders the machine completely unusable: no nvme boot disk, no usb
> >>>>>>>> keyboard, etc.
> >>>>>>>> 
> >>>>>>>> The faults I see look something like this:
> >>>>>>>> 
> >>>>>>>> DMAR4: ahci0: pci0:17:5 sid 8d fault acc 0 adt 0x0 reason 0x3 addr 26000
> >>>>>>>> 
> >>>>>>>> It's a bit surprising to see a toolchain upgrade produce breakage like
> >>>>>>>> this, but that's what git bisect clearly tells me.  I wonder if some of
> >>>>>>>> the IOMMU control structures might be defined as C bitfields and the new
> >>>>>>>> compiler is emitting them differently?  Also, was any breakage like this
> >>>>>>>> observed when -current was upgraded to llvm15 several months ago?
> >>>>>>> 
> >>>>>>> I haven't heard anything about such breakage, no.
> >>>>>>> 
> >>>>>>> 
> >>>>>>>> More generally, this is the second time in as many months I've had to
> >>>>>>>> deal with IOMMU breakage on -stable.  I can't imagine I'm the only
> >>>>>>>> person who sees value in running with DMA remapping enabled; do we need
> >>>>>>>> a dedicated DMAR-enabled machine in the cluster to smoke-test changes
> >>>>>>>> like this?  More generally, should we avoid MFCing high-risk changes
> >>>>>>>> like this?
> >>>>>>> 
> >>>>>>> Since there were very few bug reports, it was not deemed high risk.
> >>>>>>> 
> >>>>>>> In any case, it would be good to get the bottom of what is causing the
> >>>>>>> problem, so is there any way you can isolate which code seems to be
> >>>>>>> going "bad"?
> >>>>>>> 
> >>>>>>> For example, if this problem affects code in sys/dev/iommu, is there
> >>>>>>> some way you can compile that part with -O1, or with an older version
> >>>>>>> of clang (from ports), to see if the problem goes away?
> >>>>>> 
> >>>>>> I did try removing all custom make.conf settings (previously I just had
> >>>>>> CPUTYPE?=icelake-server), but that didn't change the behavior.
> >>>>>> 
> >>>>>> Before I try further build tweaks, I'd like to ask if the IOMMU fault
> >>>>>> report can provide guidance here?  AFAICT all the faults I'm getting
> >>>>>> show "reason 0x3".  If I'm reading the VT-d spec correctly, FR=0x3
> >>>>>> indicates an invalid context entry, in other words there's something the
> >>>>>> hardware doesn't like in the way the address width or pagetable base is
> >>>>>> configured for the PCIe requestor.
> >>>>> 
> >>>>> I would start looking at the other direction: might be, there are still some
> >>>>> left shifts for int32 values with the shift count > 30, or uint32 with the
> >>>>> count > 31.
> >>>>> 
> >>>>> Also might be useful to dump each context entry on creation, it is kept
> >>>>> constant after.
> >>>> 
> >>>> I did look over the constants in intel_reg.h, and didn't see anything
> >>>> that looked as though it would be susceptible to sign-extension or
> >>>> truncation bugs.  In the failing case it's much easier for me to catch
> >>>> the fault messages than any initialization message, so I instrumented
> >>>> the fault handler to get the context entry from the dmar_ctx object
> >>>> using the same logic as dmar_map_ctx_entry(), and then dump out the ctx1
> >>>> and ctx2 fields.  What I see are messages like:
> >>>> 
> >>>> ... ctx1 0x10013b001 ctx2 0x103
> >>>> 
> >>>> At first glance these "look right": the P bit is set in ctx1, and the
> >>>> rest of the field looks like a valid physical address.  ctx2 also
> >>>> doesn't have any of the reserved bits set, but in all cases it does have
> >>>> AW=3, which would indicate 57-bit AGAW.  But when I boot the last
> >>>> working kernel, from the revision prior to the llvm15 MFC, I see this in
> >>>> dmesg:
> >>>> 
> >>>> ahci0: dmar4 pci0:0:17:5 rid 8d domain 1 mgaw 48 agaw 48 re-mapped
> >>>> 
> >>>> ...all reported devices show 48-bit MGAW/AGAW, so I would expect ctx2 to
> >>>> have AW=2.  I suspect this may be the source of the fault, but I'm not
> >>>> sure how it's getting configured that way, whether it's an issue with
> >>>> reading the capability register or something else.
> >>>> 
> >>> 
> >>> I can confirm that hacking domain_set_agaw() to always use the settings
> >>> from sagaw_bits[2] eliminates the faults and at least allows the machine
> >>> to boot to single-user mode.
> >> I see what's happening now.  When I added the hack to always set
> >> sagaw_bits[2], I noted that the passed-in MGAW was still 57, while
> >> unit->hw_cap had the correct value of 0x4 (=> 4-level paging, 48-bit AW)
> >> in bits 12:8.  The problem is that sagaw_bits has agaw=64 in its last
> >> entry.  This results in dmar_maxaddr2mgaw() attempting a comparison
> >> against 1ULL << 64 in the final iteration of its first loop.  I suspect
> >> the new compiler probably determines that last iteration is meaningless
> >> and simply omits it from the (probably unrolled) loop.  Since the "loop"
> >> terminates with i < nitems(sagaw_bits), the subsequent "allow_less ..."
> >> case doesn't execute and we end up erroneously selecting a 57-bit
> >> address width.  Just commenting out that last entry in sagaw_bits fixes
> >> the problem.
> >> So, two questions:
> >> 1) Does any VT-d hardware actually support 6-level paging?  The ca. 2021
> >> VT-d spec I'm looking at indicates 5-level is the greatest depth
> >> supported, with everything above that being reserved.
> >> 2) I'd expect clang to try very hard to error out in a situation like
> >> this, but I see that sys/conf/kern.mk sets -Wno-shift-count-overflow
> >> among other things, and more of them were added for clang 15.  This
> >> seems like a really bad idea, regardless of how much of a PITA it may be
> >> to fix these warnings.
> > 
> > FWIW, I've been working on trying to re-enable some of the warnings that
> > were disabled for clang 15 in main.  I'll move that one higher up on my
> > todo list.
> 
> In this particular case, it doesn't warn about it though. I think KASAN
> might be a better 'catcher' for this kind of error, or a KUBSAN, if we
> had one...

If you've tried turning all the relevant warnings/errors back on for
this code, and clang truly doesn't warn about it, then wouldn't this
warrant a bug against upstream clang?  This is a situation in which the
compiler detects a left-shift overflow, by itself at least a warning
condition, and uses it to materially alter program flow.

> 
> -Dimitry
>