git: 5a898b2b78ce - main - Set PCIe device's Max_Payload_Size to match PCIe root's.

Alexander Motin mav at FreeBSD.org
Thu Apr 8 19:45:08 UTC 2021



On 08.04.2021 15:26, John Baldwin wrote:
> On 4/8/21 11:02 AM, Alexander Motin wrote:
>> On 08.04.2021 13:36, John Baldwin wrote:
>>> On 4/5/21 7:40 AM, Alexander Motin wrote:
>>>> +    rmps = pcie_read_config(root, PCIER_DEVICE_CTL, 2) &
>>>> +        PCIEM_CTL_MAX_PAYLOAD;
>>>> +    mps = pcie_read_config(dev, PCIER_DEVICE_CTL, 2) &
>>>> +        PCIEM_CTL_MAX_PAYLOAD;
>>>> +    if (mps == rmps)
>>>> +        return;
>>>> +    /* Check whether the device is capable of the root's MPS. */
>>>> +    mmps = (pcie_read_config(dev, PCIER_DEVICE_CAP, 2) &
>>>> +        PCIEM_CAP_MAX_PAYLOAD) << 5;
>>>> +    if (rmps > mmps) {
>>>> +        /*
>>>> +         * The device is unable to handle root's MPS.  Limit root.
>>>> +         * XXX: We should traverse through all the tree, applying
>>>> +         * it to all the devices.
>>>> +         */
>>>
>>> Hmmm, limiting the root seems very dubious here.  Do you really need
>>> this?
>>
>> All devices under the same root (at least ones that talk to each other)
>> must have the same MPS value, otherwise some may consider larger
>> transfer as an error.  In case of direct PCIe connection the root is the
>> only other device, so this code should be sufficient.
> 
> I mean, had you seen any cases where you needed to adjust the root? 

No.  I don't think I have NVMe device incapable of 256 byte transfers to
which my Supermicro boards BIOS default.  But I can easily imagine
either such device or BIOS with bigger default.

> I
> do worry about this breaking other systems due to it not doing the walk
> of the full sub-tree.  Maybe only do it if the root port is the grandparent
> of the device in question since that is the safe case here and punt if
> there are switches in the middle?

I have no problem limiting it to grandparent, if you prefer, just not
sure it is much better.  If there are more devices under that root, then
without full tree traversal either some old or the new device get MPS
mismatch.  The only question whether it will cause a receive of too
large transfer by the root capable of it but limited by the MPS setting
(formally a protocol violation, but at least Intel Optane 905p seem to
ignore it), or the new device that is not even capable of it.

>>> If not, I'd put it behind a tunable sysctl that defaults to off. 
>>> Ideally
>>> what you'd do here is use an and of the two masks and select one of
>>> those
>>> bits to choose the value rather than assuming the root can do the
>>> device's
>>> value.
>>
>> It is not a bitmask, it is a power-of-2 between 128 bytes and the
>> maximum device capability.  So if the root is already configured for
>> higher value, then it must support the lover one too.
> 
> I thought the DEVICE_CAP is a bitmask, and that is what I was thinking
> of using
> the and with.  Nominally I think we should be doing the AND of the
> DEVICE_CAP
> field for all the devices under a port and then programming them to some
> value
> remaining in the mask.  Most devices just DMA to/from memory though
> rather than
> to each other, so generally you just care about the path from a device
> to a root
> port.

"Max_Payload_Size Supported – This field indicates the maximum payload
size that the Function can support for TLPs." -- this field in
DEVICE_CAP is not a bitmask.  So nominally we should take minimum for
all the devices.

-- 
Alexander Motin


More information about the dev-commits-src-all mailing list