Re: downgraded pcie link width

From: Zhenlei Huang <zlei_at_FreeBSD.org>
Date: Wed, 27 Aug 2025 15:08:00 UTC

> On Aug 27, 2025, at 9:12 PM, John Baldwin <jhb@freebsd.org> wrote:
> 
> On 8/27/25 08:47, ShengYi Hung wrote:
>> Sorry, for HADW bit. You have to set this bit instead of clear it to
>> disable it then issue retraining. It will disable upconfiguration in TS2 and let device runs on
>> the maximum width.
>> This can be doen by using setpci.
> 
> pciconf in the base system can also write to configuration registers.  That said,
> if this should ideally have cooperation from the device driver, then it may make
> sense to instead add a new devctl operation to trigger a retrain so that we can
> suspend the device driver while retraining.

Hi John and ShengYi,

I'll explore tomorrow.

Thanks for your suggestion.

> 
>> ShengYi Hung <aokblast@FreeBSD.org> writes:
>>> --text follows this line--
>>> 
>>> 
>>> In my understanding, the
>>> 
>>> Zhenlei Huang <zlei@FreeBSD.org> writes:
>>> 
>>>> Hi,
>>>> 
>>>> I'm recently hacking on the QLogic FastLinQ QL41212HLCU 25GbE adapter, and found something weird.
>>>> 
>>>> It is a two SFP28 port card with PCIe 3.0 x8 link [1]. I connected the two ports with DAC cable directly to do benchmark.
>>>> The weirdness is that no matter how much load I try to put into the card, it can only reach to about 13Gbps.
>>>> I used iperf3 to do the benchmark. Also tried disabling TSO and LRO, enabling Jumbo MTU, but no luck.
>>>> 
>>>> I checked the SFP module ( SFP28 DAC cable ) and ifconfig shows the link is 25000G,
>>>> 
>>>> ```
>>>> # ifconfig -j1 -mv ql0
>>>> ql0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
>>>> 	options=8d00bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,HWSTATS>
>>>> 	capabilities=8d07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,HWSTATS>
>>>> 	ether xx:xx:xx:xx:xx:xx
>>>> 	inet 172.16.1.1 netmask 0xffffff00 broadcast 172.16.1.255
>>>> 	media: Ethernet autoselect (25GBase-CR <full-duplex>)
>>>> 	status: active
>>>> 	supported media:
>>>> 		media autoselect
>>>> 		media autoselect mediaopt full-duplex
>>>> 		media 25GBase-CR
>>>> 		media 25GBase-SR
>>>> 	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>>> 	drivername: ql0
>>>> 	plugged: SFP/SFP+/SFP28 25GBASE-CR CA-25G-S (Copper pigtail)
>>>> 	vendor: OEM PN: CAB-ZSP/ZSP-P2M SN: XXXXXXXXXXXXX DATE: 2025-07-04
>>>> ```
>>>> 
>>>>  and finally I observed something unusual from pciconf,
>>>> 
>>>> ```
>>>> # pciconf -lcv ql0
>>>> ...
>>>>     cap 10[70] = PCI-Express 2 endpoint max data 256(512) FLR NS
>>>>                  max read 4096
>>>>                  link x2(x8) speed 8.0(8.0) ClockPM disabled
>>>> ```
>>>> 
>>>> That can also be verified by lspci from pciutils ports.
>>>> ```
>>>> # lspci -s 08:00.0 -vv
>>>> ...
>>>> 		LnkCap:	Port #0, Speed 8GT/s, Width x8, ASPM not supported
>>>> 			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>> 		LnkSta:	Speed 8GT/s, Width x2 (downgraded)
>>>> 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>> ```
>>>> 
>>>> What I have tried,
>>>> 
>>>>  1. Plugged the card into different mother board ( 3 different vendors, Dell, HP, and Gigabyte ), and different PCIe slot ( x16 and x4 ).
>>>>  2. Upgraded the BIOS of mother board.
>>>>  3. Disabled ASPM in BIOS.
>>>>  4. Upgraded the firmware of card.
>>>>  5. Booted with Debian 13 live CD.
>>>> 
>>>> Nothing has changed. The PCIe link width can only be negotiated to maximum of x2, with or without driver loaded, with / without load on the card.
>>>> It is also interesting that it can only be negotiated to x1 on Gigabyte motherboard, which has only one PCIe 2.0 x16 slot.
>>>> 
>>>> After Googling I found some articles say that the PCIe link width is negotiated at the training stage, which is at POST before the driver loads.
>>>> They hint that downgraded link width is mostly caused by wrong BIOS configure, or hardware issues such as scratched gold fingers.
>>>> I would almost give up and found the product brief [2], in which it declares `Supports PCIe upconfigure to reduce link width to conserve power`.
>>>> So interesting, maybe it is the firmware's fault that the firmware does not **upconfigure** ( retraining ) on sufficient load ?
>>>> 
>>>> Are your FastLinQ 41000 ethernet cards been rightly negotiated to x8 ?
>>>> 
>>>> What can I do next ?
>>>> 
>>>> CC John, I guess he is familiar with PCIe spec :)
>>>> 
>>>> 
>>>> [1] https://www.marvell.com/products/ethernet-adapters-and-controllers/41000-ethernet-adapters.html
>>>> [2] https://www.marvell.com/content/dam/marvell/en/public-collateral/ethernet-adaptersandcontrollers/marvell-ethernet-adapters-fastlinq-41000-series-product-brief.pdf
>>>> 
>>>> Best regards,
>>>> Zhenlei
> 
> -- 
> John Baldwin