Re: git: a58536b91ae3 - main - pci: Disable Electromechanical Interlock.
- In reply to: John Baldwin : "Re: git: a58536b91ae3 - main - pci: Disable Electromechanical Interlock."
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 04 Oct 2022 23:32:39 UTC
On 04.10.2022 12:47, John Baldwin wrote: > On 10/4/22 7:44 AM, Alexander Motin wrote: >> The branch main has been updated by mav: >> >> URL: >> https://cgit.FreeBSD.org/src/commit/?id=a58536b91ae3931d222c3e4f1a949ff4a4927fb2 >> >> commit a58536b91ae3931d222c3e4f1a949ff4a4927fb2 >> Author: Alexander Motin <mav@FreeBSD.org> >> AuthorDate: 2022-10-04 14:34:15 +0000 >> Commit: Alexander Motin <mav@FreeBSD.org> >> CommitDate: 2022-10-04 14:34:15 +0000 >> >> pci: Disable Electromechanical Interlock. >> Add sysctl/tunable to control Electromechanical Interlock support. >> Disable it by default since Linux does not do it either and it seems >> the number of systems having it broken is higher than having >> working. >> This fixes NVMe backplane operation on ASUS RS500A-E11-RS12U server >> with AMD EPYC 7402 CPU, where attempts to control reported interlock >> for some reason end up in PCIe link loss, while interlock status >> does >> not change (it is not really there). >> MFC after: 2 weeks > > See also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256264, though > that is more for the case where slots aren't really hotplug at all. > > The root issue seems to be that there are generic HotPlug-capable > bridges but that manufacturers fail to correctly wire up the various > input pins such that the bridges can actually determine that there is no > MRL or EI, etc. The above PR (which I still can't get the reporter to > test the patch for, but perhaps should just merge?) disables PCI-e hotplug > if the link is up, but the other status bits claim that the device is > partially inserted when attaching the bridge. In my case the slots are really expected to be hot-pluggable, just ASUS can't do things right. In the case of the PR your patch seems to have sense. I'd be more worried about already present check for broken MRL -- if we see MRL open, but device is still powered, we may wish to quickly shut the device. But I agree that probability of false negative here is much higher than of positive. I still haven't had my hands on on any hardware implementing all those cool bells and whistles. -- Alexander Motin