NVME aborting outstanding i/o

Warner Losh imp at bsdimp.com
Thu Apr 4 15:11:40 UTC 2019

On Thu, Apr 4, 2019 at 2:39 AM Patrick M. Hausen <hausen at punkt.de> wrote:

> Hi all,
> I’m currently doing some load tests/burn in for two new servers.
> These feature all NVME SSDs and run FreeNAS, i.e. FreeBSD 11.2-STABLE.
>         pcib17: <ACPI PCI-PCI bridge> at device 3.2 numa-domain 1 on pci15
>         pcib17: [GIANT-LOCKED]
>         pci17: <ACPI PCI bus> numa-domain 1 on pcib17
>         nvme7: <Generic NVMe Device> mem 0xeca10000-0xeca13fff at device
> 0.0 numa-domain 1 on pci17
> When putting some moderate i/o load on the system, the log fills with these
> messages:
>         nvme7: aborting outstanding i/o
>         nvme7: DATASET MANAGEMENT sqid:41 cid:91 nsid:1
>         nvme7: ABORTED - BY REQUEST (00/07) sqid:41 cid:91 cdw0:0

OK. So unless you are suspending and resuming, or the drive is somehow
failing, here's what's going on:

There's a request that was sent down to the drive. It took longer than 30s
to respond. One of them, at least, was a trim request.

There's a number of reasons for this. NAND sucks. It's a horrible steaming
pile of... silicon. To make it useful, there's a layer of software called
the FTL (flash translation layer). NAND is an append-only medium at the
lowest level, so the FTL has to take requests and build a map of logical
blocks to physical blocks, as well as manage the 'log structured device' in
some way. The details of why are too long to get into here (see my BSDCan
talk from a few years ago). But what is relevant is that many drives have
really crappy FTLs, especially when it comes to TRIMs. They can't handle a
lot of them, and when you send a lot down, like FreeBSD will often do with
UFS or ZFS, you can trigger the driving doing a bunch of garbage
collection. This can cause the drive to delay > 30s before responding to
commands. So sometimes you can avoid this by disabling trims.

Other times, you have crappy FTL that crashes. This can cause a long
timeout because FreeBSD has done something that, while in spec, is
unexpected or not well tested. Here you can really only have FreeBSD do
less work at once to avoid this issue, or you can upgrade the firmware.

There has been some discussion of this on on the iX Systems forum as well
> as various
> FreeBSD media and one person suggested setting:
>         hw.nvme.per_cpu_io_queues=0
> This is where I need some help now. This is from the manpage for nvme(4):
> ----------
>     To force a single I/O queue pair shared by all CPUs, set the following
>     tunable value in loader.conf(5):
>           hw.nvme.per_cpu_io_queues=0
>     To assign more than one CPU per I/O queue pair, thereby reducing the
>     number of MSI-X vectors consumed by the device, set the following
> tunable
>     value in loader.conf(5):
>           hw.nvme.min_cpus_per_ioq=X
>     To force legacy interrupts for all nvme driver instances, set the
>     following tunable value in loader.conf(5):
>           hw.nvme.force_intx=1
>     Note that use of INTx implies disabling of per-CPU I/O queue pairs.
> ----------
> But:
>         root at freenas01[~]# sysctl hw.nvme.per_cpu_io_queues
>         sysctl: unknown oid 'hw.nvme.per_cpu_io_queues'
>         root at freenas01[~]# sysctl hw.nvme.min_cpus_per_ioq
>         sysctl: unknown oid 'hw.nvme.min_cpus_per_ioq'
>         root at freenas01[~]# sysctl hw.nvme.force_intx
>         sysctl: unknown oid 'hw.nvme.force_intx'
> Where do I go from here?

Did you add it to /boot/loader.conf? There's no sysctl for this.


> Thanks!
> Patrick
> --
> punkt.de GmbH                   Internet - Dienstleistungen - Beratung
> Kaiserallee 13a                 Tel.: 0721 9109-0 Fax: -100
> 76133 Karlsruhe                 info at punkt.de   http://punkt.de
> AG Mannheim 108285              Gf: Juergen Egeling
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"

More information about the freebsd-stable mailing list