[Bug 243531] Unstable ena and nvme on AWS
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Wed Jan 22 23:28:23 UTC 2020
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243531
Bug ID: 243531
Summary: Unstable ena and nvme on AWS
Product: Base System
Version: 12.1-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: bugs at FreeBSD.org
Reporter: leif at ofWilsonCreek.com
We just recently upgraded our systems on AWS to 12.1, and we're seeing errors
with nvme0 and ena0. Typically, these errors manifest together. My sample size
is 34 instances, all t3.medium or r4.large. I have others that are t2.small,
and those are fine being that they don't use ena or nvme drivers. Of the 34,
there's a group of 15 that are almost idle, and never throw these errors. Of
the remaining that have load, only 5 ran without errors and 14 of them threw
these errors scattered about various times in the last 4.5 days. 3 machines
have crashed during this period, so the errors seem to often be nonfatal, but
not always. So based on that it seems load related, and does not seem isolated
to an occasional hardware problem.
Here's a sample of the log from one of the machines that crashed:
Jan 22 00:17:05 jdas-dev kernel: nvme0: cpl does not map to outstanding cmd
Jan 22 00:17:05 jdas-dev kernel: cdw0:00000000 sqhd:000d sqid:0001 cid:0012 p:1
sc:00 sct:0 m:0 dnr:0
Jan 22 00:17:05 jdas-dev kernel: nvme0: Resetting controller due to a timeout.
Jan 22 00:17:05 jdas-dev kernel: nvme0: resetting controller
Jan 22 00:17:05 jdas-dev kernel: nvme0: temperature threshold not supported
Jan 22 00:17:05 jdas-dev kernel: nvme0: aborting outstanding i/o
Jan 22 00:17:05 jdas-dev kernel: nvme0: resubmitting queued i/o
Jan 22 00:17:05 jdas-dev kernel: nvme0: WRITE sqid:2 cid:0 nsid:1 lba:691756520
len:8
Jan 22 00:17:21 jdas-dev kernel: ena0: The number of lost tx completion is
above the threshold (129 > 128). Reset the device
Jan 22 00:17:21 jdas-dev kernel: ena0: Trigger reset is on
Jan 22 00:17:21 jdas-dev kernel: ena0: device is going DOWN
Jan 22 00:17:21 jdas-dev kernel: ena0: free uncompleted tx mbuf qid 0 idx 0x154
Jan 22 00:17:22 jdas-dev kernel: ena0: device is going UP
Jan 22 00:17:22 jdas-dev kernel: ena0: link is UP
Jan 22 00:17:52 jdas-dev kernel: nvme0: Missing interrupt
Jan 22 00:18:46 jdas-dev kernel: nvme0: Missing interrupt
Jan 22 00:19:34 jdas-dev kernel: nvme0: cpl does not map to outstanding cmd
Jan 22 00:19:34 jdas-dev kernel: cdw0:00000000 sqhd:001a sqid:0002 cid:001b p:0
sc:00 sct:0 m:0 dnr:0
Jan 22 00:19:34 jdas-dev kernel: nvme0: Resetting controller due to a timeout.
Jan 22 00:19:34 jdas-dev kernel: nvme0: resetting controller
Jan 22 00:19:35 jdas-dev kernel: nvme0: temperature threshold not supported
Jan 22 00:19:35 jdas-dev kernel: nvme0: resubmitting queued i/o
Jan 22 00:19:35 jdas-dev kernel: nvme0: WRITE sqid:1 cid:0 nsid:1 lba:405055248
len:8
Jan 22 00:19:35 jdas-dev kernel: nvme0: aborting outstanding i/o
At this point, we rebooted the machine.
Jan 22 09:02:30 jdas-dev kernel: nvme0: Resetting controller due to a timeout.
Jan 22 09:02:30 jdas-dev kernel: nvme0: resetting controller
Jan 22 09:02:30 jdas-dev kernel: nvme0: temperature threshold not supported
Jan 22 09:02:30 jdas-dev kernel: nvme0: aborting outstanding i/o
Jan 22 09:02:30 jdas-dev kernel: nvme0: DATASET MANAGEMENT sqid:2 cid:27 nsid:0
Jan 22 09:02:30 jdas-dev kernel: nvme0: INVALID OPCODE (00/01) sqid:2 cid:27
cdw0:0
Jan 22 09:02:30 jdas-dev kernel: ena0: Keep alive watchdog timeout.
Jan 22 09:02:30 jdas-dev kernel: ena0: Trigger reset is on
Jan 22 09:02:30 jdas-dev kernel: ena0: device is going DOWN
Jan 22 09:02:30 jdas-dev kernel: ena0: ena0: device is going UP
Jan 22 09:02:30 jdas-dev kernel: link is UP
Jan 22 09:02:30 jdas-dev kernel: ena0: Keep alive watchdog timeout.
Jan 22 09:02:30 jdas-dev kernel: ena0: Trigger reset is on
Jan 22 09:02:30 jdas-dev kernel: ena0: device is going DOWN
Jan 22 09:02:30 jdas-dev kernel: ena0: ena0: device is going UP
Jan 22 09:02:30 jdas-dev kernel: link is UP
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 312763, size: 4096
Jan 22 09:02:30 jdas-dev kernel: 90 second watchdog timeout expired. Shutdown
terminated.
Jan 22 09:02:30 jdas-dev kernel: Wed Jan 22 08:58:58 CST 2020
Jan 22 09:02:30 jdas-dev kernel: 2020-01-22T08:59:01.658827-06:00
jdas-dev.aws0.pla-net.cc init 1 - - /etc/rc.shutdown terminated abnormally,
going to single user mode
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 312763, size: 4096
Jan 22 09:02:30 jdas-dev kernel: 2020-01-22T08:59:23.108170-06:00
jdas-dev.aws0.pla-net.cc init 1 - - some processes would not die; ps axl
advised
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 445, size: 12288
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21337, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 372873, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21854, size: 57344
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 118775, size: 8192
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 315370, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 312763, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 445, size: 12288
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21337, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 372873, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21854, size: 57344
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 118775, size: 8192
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 315370, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 312763, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 445, size: 12288
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21337, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21854, size: 57344
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 372873, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 118775, size: 8192
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 315370, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 312763, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 445, size: 12288
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21337, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21854, size: 57344
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 372873, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 118775, size: 8192
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 312763, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 315370, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 445, size: 12288
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21337, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21854, size: 57344
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 372873, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 118775, size: 8192
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 315370, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 312763, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 445, size: 12288
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21337, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21854, size: 57344
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 372873, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 118775, size: 8192
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 315370, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 312763, size: 4096
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 445, size: 12288
Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 21337, size: 4096
Jan 22 09:02:30 jdas-dev kernel: ---<<BOOT>>---
Jan 22 09:02:30 jdas-dev kernel: Copyright (c) 1992-2019 The FreeBSD Project.
Jan 22 09:02:30 jdas-dev kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988,
1989, 1991, 1992, 1993, 1994
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list