Re: Pinebook pro IOMMU enabled crashes

From: Jesper Schmitz Mouridsen <jsm_at_FreeBSD.org>
Date: Thu, 30 Sep 2021 16:39:43 UTC
On 30.09.2021 18.28, Jesper Schmitz Mouridsen wrote:
> On 23.09.2021 19.58, Jesper Schmitz Mouridsen wrote:
>> Hi
>>
>> I just rebuild a generic arm64 with only this change:
>>
>> diff --git a/sys/arm64/conf/GENERIC b/sys/arm64/conf/GENERIC
>> index c716183aae61..7a609db412ca 100644
>> --- a/sys/arm64/conf/GENERIC
>> +++ b/sys/arm64/conf/GENERIC
>> @@ -19,7 +19,7 @@
>>
>>   cpu            ARM64
>>   ident          GENERIC
>> -
>> +options                IOMMU
>>   include                "std.arm64"
>>   include                "std.dev"
>>
>> FreeBSD 14.0-CURRENT #6 main-n249584-fd69939e79a6-dirty
>>
>> It does not happen without the nvme attached.
>>
>> pcib0: <Rockchip PCIe controller> mem 
>> 0xf8000000-0xf9ffffff,0xfd000000-0xfdffffff irq 6,7,8 on ofwbus0
>> pci0: <OFW PCI bus> on pcib0
>> pcib1: <PCI-PCI bridge> at device 0.0 on pci0
>> pcib0: failed to reserve resource for pcib1
>> pcib1: failed to allocate initial memory window: 0-0xfffff
>> pci1: <PCI bus> on pcib1
>> nvme0: <Generic NVMe Device> at device 0.0 on pci1
>> Fatal data abort:
>>    x0:                0
>>    x1:             1000
>>    x2:            10040
>>    x3:             2000
>>    x4:                1
>>    x5: ffff00009a7e0168
>>    x6: 1400000000000000
>>    x7:   10000000000000
>>    x8:             1168
>>    x9:                1
>>   x10:                0
>>   x11: ffff000000e8c8c0
>>   x12: ffff000000e8c840
>>   x13:                1
>>   x14:            10000
>>   x15:                1
>>   x16:            10000
>>   x17: ffff000000e8c85c
>>   x18: ffff000001064180
>>   x19: ffff000001064248
>>   x20:                0
>>   x21: ffff00009a7df000
>>   x22: ffffa0000102ea00
>>   x23: ffffa00000bb6b80
>>   x24: ffffa00001086200
>>   x25: ffff000000aa8478
>>   x26: ffffa00001086300
>>   x27: ffff000000dda000
>>   x28:                7
>>   x29: ffff000001064190
>>    sp: ffff000001064180
>>    lr: ffff00000075f20c
>>   elr: ffff00000078a654
>> spsr:         200000c5
>>   far:                0
>>   esr:         96000004
>> panic: vm_fault failed: ffff00000078a654 error 1
>> cpuid = 0
>> time = 1
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>> vpanic() at vpanic+0x184
>> panic() at panic+0x44
>> data_abort() at data_abort+0x23c
>> handle_el1h_sync() at handle_el1h_sync+0x78
>> --- exception, esr 0x96000004
>> iommu_map_msi() at iommu_map_msi+0x20
>> gicv3_iommu_init() at gicv3_iommu_init+0x4c
>> intr_alloc_msix() at intr_alloc_msix+0x13c
>> rk_pcie_alloc_msix() at rk_pcie_alloc_msix+0xfc
>> pci_alloc_msix_method() at pci_alloc_msix_method+0x1a8
>> nvme_pci_attach() at nvme_pci_attach+0x378
>> device_attach() at device_attach+0x400
>> device_probe_and_attach() at device_probe_and_attach+0x7c
>> bus_generic_attach() at bus_generic_attach+0x18
>> pci_attach() at pci_attach+0xe8
>> device_attach() at device_attach+0x400
>> device_probe_and_attach() at device_probe_and_attach+0x7c
>> bus_generic_attach() at bus_generic_attach+0x18
>> device_attach() at device_attach+0x400
>> device_probe_and_attach() at device_probe_and_attach+0x7c
>> bus_generic_attach() at bus_generic_attach+0x18
>> pci_attach() at pci_attach+0xe8
>> device_attach() at device_attach+0x400
>> device_probe_and_attach() at device_probe_and_attach+0x7c
>> bus_generic_attach() at bus_generic_attach+0x18
>> rk_pcie_attach() at rk_pcie_attach+0x14cc
>> device_attach() at device_attach+0x400
>> device_probe_and_attach() at device_probe_and_attach+0x7c
>> bus_generic_new_pass() at bus_generic_new_pass+0xf8
>> bus_generic_new_pass() at bus_generic_new_pass+0xa8
>> bus_generic_new_pass() at bus_generic_new_pass+0xa8
>> bus_set_pass() at bus_set_pass+0x4c
>> mi_startup() at mi_startup+0x12c
>> virtdone() at virtdone+0x6c
>>
>> /jsm
>>
>>
>> On 23.09.2021 09.19, Emmanuel Vadot wrote:
>>> On Sat, 18 Sep 2021 13:15:45 +0200
>>> Jesper Schmitz Mouridsen <jsm@FreeBSD.org> wrote:
>>>
>>>> Hi
>>>>
>>>> Perhaps this one
>>>> https://www.mail-archive.com/svn-src-head@freebsd.org/msg126068.html is
>>>> giving troubles?
>>>>
>>>> main-n249225-f673cc5edac3-dirty
>>>> nvme0: <Generic NVMe Device> at device 0.0 on pci1
>>>> Fatal data abort:
>>>>     x0:                0
>>>>     x1:             1000
>>>>     x2:            10040
>>>>     x3:             2000
>>>>     x4:                1
>>>>     x5: ffff00009a7a0168
>>>>     x6: 1d00000000000000
>>>>     x7:   10000000000000
>>>>     x8:             1168
>>>>     x9:                1
>>>>    x10:                0
>>>>    x11: ffff000000f35140
>>>>    x12: ffff000000f350c0
>>>>    x13:                1
>>>>    x14:            10000
>>>>    x15:                1
>>>>    x16:            10000
>>>>    x17: ffff000000f350dc
>>>>    x18: ffff00000110d180
>>>>    x19: ffff00000110d248
>>>>    x20:                0
>>>>    x21: ffff00009a79f000
>>>>    x22: ffffa000010b0a00
>>>>    x23: ffffa000010a2880
>>>>    x24: ffffa0000116da00
>>>>    x25: ffff000000b4fd78
>>>>    x26: ffffa0000116db00
>>>>    x27: ffff000000e83000
>>>>    x28:                7
>>>>    x29: ffff00000110d190
>>>>     sp: ffff00000110d180
>>>>     lr: ffff00000077520c
>>>>    elr: ffff0000007a03ac
>>>> spsr:         200000c5
>>>>    far:                0
>>>>    esr:         96000004
>>>> panic: vm_fault failed: ffff0000007a03ac error 1
>>>> cpuid = 0
>>>> time = 1
>>>> KDB: stack backtrace:
>>>> db_trace_self() at db_trace_self
>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>>>> vpanic() at vpanic+0x184
>>>> panic() at panic+0x44
>>>> data_abort() at data_abort+0x23c
>>>> handle_el1h_sync() at handle_el1h_sync+0x78
>>>> --- exception, esr 0x96000004
>>>> iommu_map_msi() at iommu_map_msi+0x20
>>>> gicv3_iommu_init() at gicv3_iommu_init+0x4c
>>>> intr_alloc_msix() at intr_alloc_msix+0x13c
>>>> rk_pcie_alloc_msix() at rk_pcie_alloc_msix+0xfc
>>>> pci_alloc_msix_method() at pci_alloc_msix_method+0x1a8
>>>> nvme_pci_attach() at nvme_pci_attach+0x378
>>>> device_attach() at device_attach+0x400
>>>> device_probe_and_attach() at device_probe_and_attach+0x7c
>>>> bus_generic_attach() at bus_generic_attach+0x18
>>>> pci_attach() at pci_attach+0xe8
>>>> device_attach() at device_attach+0x400
>>>> device_probe_and_attach() at device_probe_and_attach+0x7c
>>>> bus_generic_attach() at bus_generic_attach+0x18
>>>> device_attach() at device_attach+0x400
>>>> device_probe_and_attach() at device_probe_and_attach+0x7c
>>>> bus_generic_attach() at bus_generic_attach+0x18
>>>> pci_attach() at pci_attach+0xe8
>>>> device_attach() at device_attach+0x400
>>>> device_probe_and_attach() at device_probe_and_attach+0x7c
>>>> bus_generic_attach() at bus_generic_attach+0x18
>>>> rk_pcie_attach() at rk_pcie_attach+0x14cc
>>>> device_attach() at device_attach+0x400
>>>> device_probe_and_attach() at device_probe_and_attach+0x7c
>>>> bus_generic_new_pass() at bus_generic_new_pass+0xf8
>>>> bus_generic_new_pass() at bus_generic_new_pass+0xa8
>>>> bus_generic_new_pass() at bus_generic_new_pass+0xa8
>>>> bus_set_pass() at bus_set_pass+0x4c
>>>> mi_startup() at mi_startup+0x12c
>>>> virtdone() at virtdone+0x6c
>>>>
>>>   That's an old commit. Did you see this panic only recently or ?
>>>
>>
> 
> 
> Even on stable/13-n247374-9faebc1e664d-dirty
> 
> I get the same backtrace when IOMMU is enabled and the nvme is attached.
> 
> pcib1: <PCI-PCI bridge> at device 0.0 on pci0
> pcib0: failed to reserve resource for pcib1
> pcib1: failed to allocate initial memory window: 0-0xfffff
> pci1: <PCI bus> on pcib1
> nvme0: <Generic NVMe Device> at device 0.0 on pci1
> Fatal data abort:
>    x0:                0
>    x1:             1000
>    x2:            10040
>    x3:             2000
>    x4:                1
>    x5: ffff00009a99e160
>    x6: 1400000000000000
>    x7:   10000000000000
>    x8:             1160
>    x9: ffff000000cd7cc0
>   x10:                0
>   x11: ffff000000d89540
>   x12: ffff000000d894c0
>   x13:                1
>   x14:            10000
>   x15:                1
>   x16:            10000
>   x17:                0
>   x18: ffff000000f5c250
>   x19: ffff000000f5c318
>   x20:                0
>   x21: ffff00009a99d000
>   x22: ffffa00000f06200
>   x23: ffffa00000f49700
>   x24: ffffa00000f8f500
>   x25: ffff0000009b85f8
>   x26: ffffa00000f8f600
>   x27: ffff000000cd7000
>   x28:                7
>   x29: ffff000000f5c260
>    sp: ffff000000f5c250
>    lr: ffff0000006bf3dc
>   elr: ffff0000006e15d0
> spsr:         600001c5
>   far:                0
>   esr:         96000004
> panic: vm_fault failed: ffff0000006e15d0
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> #0 0xffff00000047c304 at kdb_backtrace+0x60
> #1 0xffff000000437fd8 at vpanic+0x184
> #2 0xffff000000437e50 at panic+0x44
> #3 0xffff0000006d692c at data_abort+0x204
> #4 0xffff0000006bb874 at handle_el1h_sync+0x74
> #5 0xffff0000006bf3d8 at gicv3_iommu_init+0x4c
> #6 0xffff0000006bf3d8 at gicv3_iommu_init+0x4c
> #7 0xffff0000006b1940 at intr_alloc_msix+0x110
> #8 0xffff0000007860c0 at rk_pcie_alloc_msix+0xfc
> #9 0xffff000000219bbc at pci_alloc_msix_method+0x1a8
> #10 0xffff00000020ba64 at nvme_pci_attach+0x378
> #11 0xffff00000046bd80 at device_attach+0x400
> #12 0xffff00000046d14c at bus_generic_attach+0x4c
> #13 0xffff000000221f30 at pci_attach+0xe0
> #14 0xffff00000046bd80 at device_attach+0x400
> #15 0xffff00000046d14c at bus_generic_attach+0x4c
> #16 0xffff00000046bd80 at device_attach+0x400
> #17 0xffff00000046d14c at bus_generic_attach+0x4c
> Uptime: 1s
> 
> 
git checkout  50cedfede3d21824ec6023324b3ad41a435e1815 
sys/arm64/arm64/gicv3_its.c and the problem goes away. The commit is one 
before
Add IOMMU support to GICv3 Interrupt Translation Service (ITS) driver. 
(ba196aec7dad1b73a9a3b86a06259d5e81f16fad)