Re: downgraded pcie link width
- In reply to: Zhenlei Huang : "downgraded pcie link width"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 27 Aug 2025 12:39:05 UTC
From my understanding, PCIe training does not only occur during POST. After boot, the OS can instruct a PCI device to retrain, or the hardware itself can perform retraining (up-configuration, as you mentioned). To clarify, the final bandwidth of a PCI device is determined by two parameters: speed (Gen1 2.5 GT/s, Gen2 5 GT/s, Gen3 8 GT/s) and width (x1, x2, x4, or x8). Speed: This can be configured via the TLS[1] bit in the PCI capability, and the device can be allowed to retrain by setting the retrain bit. You have to trigger retraining by write retraining[2] bit after you overwrite the value. Width: This is determined automatically by the LTSSM TS2 training and cannot be directly configured. However, it is possible to disable up-configuration by clearing the HAWD[3] bit in LCTL. After doing so, issuing a retrain[2] allows the device to re-evaluate and set its width. There is a tool called setpci that can write to the PCI configuration space from userspace. On FreeBSD, this tool is available in sysutils/pciutils. [1]: https://edc.intel.com/content/www/it/it/design/publications/12th-generation-core-processor-datasheet-volume-2-of-2/link-control-2-lctl2-offset-70_2/ [2]: https://edc.intel.com/content/www/it/it/design/publications/12th-generation-core-processor-datasheet-volume-2-of-2/link-control-lctl-offset-50_2/ [3]: https://edc.intel.com/content/www/it/it/design/publications/12th-generation-core-processor-datasheet-volume-2-of-2/link-control-lctl-offset-50_2/ Zhenlei Huang <zlei@FreeBSD.org> writes: > Hi, > > I'm recently hacking on the QLogic FastLinQ QL41212HLCU 25GbE adapter, and found something weird. > > It is a two SFP28 port card with PCIe 3.0 x8 link [1]. I connected the two ports with DAC cable directly to do benchmark. > The weirdness is that no matter how much load I try to put into the card, it can only reach to about 13Gbps. > I used iperf3 to do the benchmark. Also tried disabling TSO and LRO, enabling Jumbo MTU, but no luck. > > I checked the SFP module ( SFP28 DAC cable ) and ifconfig shows the link is 25000G, > > ``` > # ifconfig -j1 -mv ql0 > ql0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 > options=8d00bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,HWSTATS> > capabilities=8d07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,HWSTATS> > ether xx:xx:xx:xx:xx:xx > inet 172.16.1.1 netmask 0xffffff00 broadcast 172.16.1.255 > media: Ethernet autoselect (25GBase-CR <full-duplex>) > status: active > supported media: > media autoselect > media autoselect mediaopt full-duplex > media 25GBase-CR > media 25GBase-SR > nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> > drivername: ql0 > plugged: SFP/SFP+/SFP28 25GBASE-CR CA-25G-S (Copper pigtail) > vendor: OEM PN: CAB-ZSP/ZSP-P2M SN: XXXXXXXXXXXXX DATE: 2025-07-04 > ``` > > and finally I observed something unusual from pciconf, > > ``` > # pciconf -lcv ql0 > ... > cap 10[70] = PCI-Express 2 endpoint max data 256(512) FLR NS > max read 4096 > link x2(x8) speed 8.0(8.0) ClockPM disabled > ``` > > That can also be verified by lspci from pciutils ports. > ``` > # lspci -s 08:00.0 -vv > ... > LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported > ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ > LnkSta: Speed 8GT/s, Width x2 (downgraded) > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > ``` > > What I have tried, > > 1. Plugged the card into different mother board ( 3 different vendors, Dell, HP, and Gigabyte ), and different PCIe slot ( x16 and x4 ). > 2. Upgraded the BIOS of mother board. > 3. Disabled ASPM in BIOS. > 4. Upgraded the firmware of card. > 5. Booted with Debian 13 live CD. > > Nothing has changed. The PCIe link width can only be negotiated to maximum of x2, with or without driver loaded, with / without load on the card. > It is also interesting that it can only be negotiated to x1 on Gigabyte motherboard, which has only one PCIe 2.0 x16 slot. > > After Googling I found some articles say that the PCIe link width is negotiated at the training stage, which is at POST before the driver loads. > They hint that downgraded link width is mostly caused by wrong BIOS configure, or hardware issues such as scratched gold fingers. > I would almost give up and found the product brief [2], in which it declares `Supports PCIe upconfigure to reduce link width to conserve power`. > So interesting, maybe it is the firmware's fault that the firmware does not **upconfigure** ( retraining ) on sufficient load ? > > Are your FastLinQ 41000 ethernet cards been rightly negotiated to x8 ? > > What can I do next ? > > CC John, I guess he is familiar with PCIe spec :) > > > [1] https://www.marvell.com/products/ethernet-adapters-and-controllers/41000-ethernet-adapters.html > [2] https://www.marvell.com/content/dam/marvell/en/public-collateral/ethernet-adaptersandcontrollers/marvell-ethernet-adapters-fastlinq-41000-series-product-brief.pdf > > Best regards, > Zhenlei -- Best Regards. ShengYi Hung.