NVME aborting outstanding i/o and controller resets

Warner Losh imp at bsdimp.com
Sat Apr 13 00:38:01 UTC 2019

On Fri, Apr 12, 2019, 1:22 PM Patrick M. Hausen <hausen at punkt.de> wrote:

> Hi Warner,
> thanks for taking the time again …
> > OK. This means that whatever I/O workload we've done has caused the NVME
> card to stop responding for 30s, so we reset it.
> I figured as much ;-)
> > So it's an intel card.
> Yes - I already added this info several times. 6 of them, 2.5“ NVME „disk
> drives“.

Yea, it was more of a knowing sigh...

> OK. That suggests Intel has a problem with their firmware.
> I came across this one:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713
> Is it more probable that Intel has got buggy firmware here than that
> „we“ are missing interrupts?

More probable bad firmware. One of the things I think that is in HEAD is a
mitigation for this that looks for completed IO on timeout before doing a

The mainboard is the Supermicro H11SSW-NT. Two NVME drive bays share
> a connector on the mainboard:
>         NVMe Ports ( NVMe 0~7, 10, 11, 14, 15)
>         The H11SSW-iN/NT has tweleve (12) NVMe ports (2 ports per 1 Slim
> SAS connector) on the motherboard.
>         These ports provide high-speed, low-latency PCI-E 3.0 x4
> connections directly from the CPU to NVMe Solid
>         State (SSD) drives. This greatly increases SSD data- throughput
> performance and significantly reduces PCI-E
>         latency by simplifying driver/software requirements resulting from
> direct PCI-E interface from the CPU to the NVMe SSD drives.
> Is this purely mechanical or do two drives share PCI-E resources? Which
> would explain
> why the problems always come in pairs (nvme6 and nvme7, for example).

I'm unfamiliar with this setup, but coming in pairs increases the missed
interrupt theory in my mind. Firmware issues usually don't come in pairs.

This afternoon I set up a system with 4 drives and I was not able to
> reproduce the problem.
> (We just got 3 more machines which happened to have 4 drives each and no
> M.2 directly
> on the mainboard).
> I will change the config to 6 drives like with the two FreeNAS systems in
> our data center.
> > [… nda(4) ...]
> > I doubt that would have any effect. They both throw as much I/O onto the
> card as possible in the default config.
> I found out - yes, just the same.

NDA drives with an iosched kernel will be able to rate limit, which may be
useful as a diagnostic tool...

> There's been some minor improvements in -current here. Any chance you
> could experimentally try that with this test? You won't get as many I/O
> abort errors (since we don't print those), and we have a few more
> workarounds for the reset path (though honestly, it's still kinda stinky).
> HEAD or RELENG_12, too?

HEAD is preferred, but any recent snapshot will do.


Kind regards,
> Patrick
> --
> punkt.de GmbH                   Internet - Dienstleistungen - Beratung
> Kaiserallee 13a                 Tel.: 0721 9109-0 Fax: -100
> 76133 Karlsruhe                 info at punkt.de   http://punkt.de
> AG Mannheim 108285              Gf: Juergen Egeling

More information about the freebsd-stable mailing list