DMA and PCIe on the Raspberry Pi Compute Module 4

From: HP van Braam <hp_at_tmm.cx>
Date: Fri, 12 Apr 2024 14:28:40 UTC
Hello,

I've been working on getting the RPI compute module 4 PCIe bus to work
with external devices, and I've made some progress. The following patch
enables l1ss and limits the pcie generation to 2, which in turn seems
to make most things work (note that this needs some clean-up, I'm not
trying to get this sub-par patch included :), without this patch
booting with almost any PCIe card inserted will result in an SError
during pcib initializaation. :

diff --git a/sys/arm/broadcom/bcm2835/bcm2838_pci.c
b/sys/arm/broadcom/bcm2835/bcm2838_pci.c
index 2dfd6744127a..e2ecfb861697 100644
--- a/sys/arm/broadcom/bcm2835/bcm2838_pci.c
+++ b/sys/arm/broadcom/bcm2835/bcm2838_pci.c
@@ -74,6 +74,7 @@
 #define REG_CPU_WINDOW_LOW			0x4070
 #define REG_CPU_WINDOW_START_HIGH		0x4080
 #define REG_CPU_WINDOW_END_HIGH			0x4084
+#define REG_PCIE_CAP                            0x00ac
 
 #define REG_MSI_ADDR_LOW			0x4044
 #define REG_MSI_ADDR_HIGH			0x4048
@@ -730,6 +731,26 @@ bcm_pcib_attach(device_t dev)
 	if (error != 0)
 		return (error);
 
+	DELAY(100);
+
+        uint32_t tmp = bcm_pcib_read_reg(sc, 0x4204);
+        tmp |= 0x2;
+        tmp |= 0x00200000;
+        bcm_pcib_set_reg(sc, 0x4204, tmp);
+
+	DELAY(100);
+
+	// Set PCIe generation to 2, any higher and the controller
fails
+	uint16_t lnkctl2 = bcm_pcib_read_reg(sc,REG_PCIE_CAP +
PCIER_LINK_CTL2);
+	uint32_t lnkcap = bcm_pcib_read_reg(sc, REG_PCIE_CAP +
PCIER_LINK_CAP);
+
+	lnkcap = (lnkcap & ~0x0000000f) | 2;
+	bcm_pcib_set_reg(sc, REG_PCIE_CAP + PCIER_LINK_CAP, lnkcap);
+	lnkctl2 = (lnkctl2 & ~0xf) | 2;
+	bcm_pcib_set_reg(sc, REG_PCIE_CAP + PCIER_LINK_CTL2, lnkctl2);
+
+	DELAY(100);
+
 	/* Done. */
 	device_add_child(dev, "pci", -1);
 	return (bus_generic_attach(dev));

After this patch PCIe devices seem to work, including a USB3 add-in
card, various sata controllers, and an AMD GPU.

However I have a problem that I can't seem to make work, when attaching
a PCIe to PCI bridge to the controller, and then attaching a PCI SCSI
card to that, things work well in initiator mode, I am able to run
benchmarks on attached SCSI devices for hours without fail, and
performance is exactly what I'd expect.

However when using ahc(4) in target mode it appears I'm getting random
garbage over the SCSI bus.

I have tested this setup as well with a qemu aarch64 virtual machine
and the same card passed through via vfio, and on an x86_64 machine in
the same setup and after fixing a problem in ahc(4) see f1e4c09, these
configurations both work. So I don't think there's a problem with
either ctl(4) or ahc(4) at this point.

I believe the problem is related to the DMA setup of the broadcom pci
controller, but I'm really not sure where to even begin debugging this.
The reason why I think that is that I can sometimes see strings of
ascii text in the garbage over the SCSI bus, for instance text such as
"libc.so.7" which kind of suggests I got some part of an inode perhaps?

I have also tried to fill all of userspace accessible memory with the
"@" symbol to see if perhaps I would get a bunch of "@" symbols over
the scsi bus, but this never happened.

Based on reading the Linux driver, and various forums on rpi4 and pcie
I have also tried limiting the amount of ram on the cm4 to 1, 2, and 3
GB but all to no avail.

I'm kind of at the end of my knowledge here and I'd love some help.

Thank you!

- HP