[Bug 236989] AWS EC2 lockups "Missing interrupt"

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed Apr 3 13:31:13 UTC 2019


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236989

            Bug ID: 236989
           Summary: AWS EC2 lockups "Missing interrupt"
           Product: Base System
           Version: 12.0-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs at FreeBSD.org
          Reporter: cao at bus.net

I am experiencing lockups on a production c5d.2xlarge instance running FreeBSD
12.0-RELEASE. Frequency is about once a week.

The harbinger of these lockups is the appearance of "nvmX Missing interrupt" in
the logs:

Apr  3 00:56:32 host kernel: nvme0: Missing interrupt
Apr  3 00:57:43 host syslogd: last message repeated 1 times
Apr  3 00:58:43 host kernel: nvme4: Missing interrupt
Apr  3 00:58:43 host kernel: nvme1: Missing interrupt
Apr  3 00:58:43 host kernel: nvme0: Missing interrupt
Apr  3 00:58:43 host kernel: nvme4: Missing interrupt
Apr  3 00:58:43 host kernel: nvme1: Missing interrupt
Apr  3 00:58:43 host kernel: nvme0: Missing interrupt
Apr  3 00:59:43 host kernel: nvme4: Missing interrupt
Apr  3 00:59:43 host kernel: nvme1: Missing interrupt
Apr  3 00:59:43 host kernel: nvme0: Missing interrupt
Apr  3 00:59:43 host kernel: nvme1: nvme4: Missing interrupt
Apr  3 00:59:43 host kernel: Missing interrupt
Apr  3 00:59:43 host kernel: nvme0: Missing interrupt
Apr  3 00:59:44 host kernel: nvme1: Missing interrupt
Apr  3 01:00:05 host kernel: nvme0: Missing interrupt
Apr  3 01:20:01 host kernel: nvme0: 
Apr  3 01:20:01 host kernel: Missing interrupt
Apr  3 01:22:10 host kernel: sonewconn: pcb 0xfffff802988adb00: Listen queue
overflow: 151 already in queue awaiting acceptance (1 occurrences)
Apr  3 01:24:33 host kernel: sonewconn: pcb 0xfffff802988adb00: Listen queue
overflow: 151 already in queue awaiting acceptance (6 occurrences)
Apr  3 01:25:35 host kernel: sonewconn: pcb 0xfffff802988adb00: Listen queue
overflow: 151 already in queue awaiting acceptance (4 occurrences)
Apr  3 01:26:45 host syslogd: last message repeated 1 times
Apr  3 01:27:49 host syslogd: last message repeated 1 times


Within a few hours the machine will become unresponsive, CPU pegged at 100%,
high disk reads and writes. It will not respond to an EC2 "stop" command and
requires a forced (hard) reset.


c5d.2xlarge
FreeBSD 12.0-RELEASE-p3 GENERIC amd64
zfs is in use on some drives, but not all.

I am running several instances with this same configuration, but only one of
them has had this issue so far, and it happens to be the host that has the
highest disk activity.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list