Re: downgraded pcie link width

From: ShengYi Hung <aokblast_at_FreeBSD.org>
Date: Wed, 27 Aug 2025 12:39:05 UTC
From my understanding, PCIe training does not only occur during POST.
After boot, the OS can instruct a PCI device to retrain, or the
hardware itself can perform retraining (up-configuration,
as you mentioned).

To clarify, the final bandwidth of a PCI device is determined by two
parameters: speed (Gen1 2.5 GT/s, Gen2 5 GT/s, Gen3 8 GT/s) and
width (x1, x2, x4, or x8).

Speed: This can be configured via the TLS[1] bit in the PCI capability,
and the device can be allowed to retrain by setting the retrain bit. You
have to trigger retraining by write retraining[2] bit after you
overwrite the value.

Width: This is determined automatically by the LTSSM TS2
training and cannot be directly configured. However, it is possible to
disable up-configuration by clearing the HAWD[3] bit in LCTL.
After doing so, issuing a retrain[2] allows the device to re-evaluate
and set its width.

There is a tool called setpci that can write to the PCI configuration
space from userspace.
On FreeBSD, this tool is available in sysutils/pciutils.

[1]:
https://edc.intel.com/content/www/it/it/design/publications/12th-generation-core-processor-datasheet-volume-2-of-2/link-control-2-lctl2-offset-70_2/
[2]:
https://edc.intel.com/content/www/it/it/design/publications/12th-generation-core-processor-datasheet-volume-2-of-2/link-control-lctl-offset-50_2/
[3]:
https://edc.intel.com/content/www/it/it/design/publications/12th-generation-core-processor-datasheet-volume-2-of-2/link-control-lctl-offset-50_2/

Zhenlei Huang <zlei@FreeBSD.org> writes:

> Hi,
>
> I'm recently hacking on the QLogic FastLinQ QL41212HLCU 25GbE adapter, and found something weird.
>
> It is a two SFP28 port card with PCIe 3.0 x8 link [1]. I connected the two ports with DAC cable directly to do benchmark.
> The weirdness is that no matter how much load I try to put into the card, it can only reach to about 13Gbps.
> I used iperf3 to do the benchmark. Also tried disabling TSO and LRO, enabling Jumbo MTU, but no luck.
>
> I checked the SFP module ( SFP28 DAC cable ) and ifconfig shows the link is 25000G,
>
> ```
> # ifconfig -j1 -mv ql0
> ql0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
> 	options=8d00bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,HWSTATS>
> 	capabilities=8d07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,HWSTATS>
> 	ether xx:xx:xx:xx:xx:xx
> 	inet 172.16.1.1 netmask 0xffffff00 broadcast 172.16.1.255
> 	media: Ethernet autoselect (25GBase-CR <full-duplex>)
> 	status: active
> 	supported media:
> 		media autoselect
> 		media autoselect mediaopt full-duplex
> 		media 25GBase-CR
> 		media 25GBase-SR
> 	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> 	drivername: ql0
> 	plugged: SFP/SFP+/SFP28 25GBASE-CR CA-25G-S (Copper pigtail)
> 	vendor: OEM PN: CAB-ZSP/ZSP-P2M SN: XXXXXXXXXXXXX DATE: 2025-07-04
> ```
>
>  and finally I observed something unusual from pciconf,
>
> ```
> # pciconf -lcv ql0
> ...
>     cap 10[70] = PCI-Express 2 endpoint max data 256(512) FLR NS
>                  max read 4096
>                  link x2(x8) speed 8.0(8.0) ClockPM disabled
> ```
>
> That can also be verified by lspci from pciutils ports.
> ```
> # lspci -s 08:00.0 -vv
> ...
> 		LnkCap:	Port #0, Speed 8GT/s, Width x8, ASPM not supported
> 			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> 		LnkSta:	Speed 8GT/s, Width x2 (downgraded)
> 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> ```
>
> What I have tried,
>
>  1. Plugged the card into different mother board ( 3 different vendors, Dell, HP, and Gigabyte ), and different PCIe slot ( x16 and x4 ).
>  2. Upgraded the BIOS of mother board.
>  3. Disabled ASPM in BIOS.
>  4. Upgraded the firmware of card.
>  5. Booted with Debian 13 live CD.
>
> Nothing has changed. The PCIe link width can only be negotiated to maximum of x2, with or without driver loaded, with / without load on the card.
> It is also interesting that it can only be negotiated to x1 on Gigabyte motherboard, which has only one PCIe 2.0 x16 slot.
>
> After Googling I found some articles say that the PCIe link width is negotiated at the training stage, which is at POST before the driver loads.
> They hint that downgraded link width is mostly caused by wrong BIOS configure, or hardware issues such as scratched gold fingers.
> I would almost give up and found the product brief [2], in which it declares `Supports PCIe upconfigure to reduce link width to conserve power`.
> So interesting, maybe it is the firmware's fault that the firmware does not **upconfigure** ( retraining ) on sufficient load ?
>
> Are your FastLinQ 41000 ethernet cards been rightly negotiated to x8 ?
>
> What can I do next ?
>
> CC John, I guess he is familiar with PCIe spec :)
>
>
> [1] https://www.marvell.com/products/ethernet-adapters-and-controllers/41000-ethernet-adapters.html
> [2] https://www.marvell.com/content/dam/marvell/en/public-collateral/ethernet-adaptersandcontrollers/marvell-ethernet-adapters-fastlinq-41000-series-product-brief.pdf
>
> Best regards,
> Zhenlei

-- 
Best Regards.
ShengYi Hung.