[Bug 235856] FreeBSD freezes on AWS EC2 t3 machines
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Wed Feb 19 13:20:37 UTC 2020
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235856
--- Comment #23 from mail at rubenvos.com ---
(In reply to Colin Percival from comment #21)
Hmm. The difference of exactly 1 hour doesn't seem to have a relationship with
different timezones...
Today we had another occurance on one of the machines:
Feb 19 03:24:53 volume3 kernel: nvme1: cpl does not map to outstanding cmd
Feb 19 03:24:53 volume3 kernel: cdw0:00000000 sqhd:000c sqid:0002 cid:0017 p:0
sc:00 sct:0 m:0 dnr:0
Feb 19 03:24:53 volume3 kernel: nvme1: Missing interrupt
Feb 19 03:24:53 volume3 kernel: nvme1: Resetting controller due to a timeout.
Feb 19 03:24:53 volume3 kernel: nvme1: resetting controller
Feb 19 03:24:54 volume3 kernel: nvme1: temperature threshold not supported
Feb 19 03:24:54 volume3 kernel: nvme1: aborting outstanding i/o
Comparing 03:24:53 with the access times of the daily scripts though:
ls -lahtuT /etc/periodic/daily/
total 128
-rwxr-xr-x 1 root wheel 1.0K Feb 19 12:31:47 2020 450.status-security
-rwxr-xr-x 1 root wheel 811B Feb 19 05:32:14 2020 999.local
-rwxr-xr-x 1 root wheel 2.8K Feb 19 05:32:14 2020 800.scrub-zfs
-rwxr-xr-x 1 root wheel 845B Feb 19 05:32:14 2020 510.status-world-kernel
-rwxr-xr-x 1 root wheel 737B Feb 19 05:32:14 2020 500.queuerun
-rwxr-xr-x 1 root wheel 498B Feb 19 05:32:14 2020 480.status-ntpd
-rwxr-xr-x 1 root wheel 451B Feb 19 05:32:14 2020 480.leapfile-ntpd
-rwxr-xr-x 1 root wheel 2.0K Feb 19 05:32:14 2020 460.status-mail-rejects
-rwxr-xr-x 1 root wheel 1.4K Feb 19 03:01:00 2020 440.status-mailq
-rwxr-xr-x 1 root wheel 705B Feb 19 03:01:00 2020 430.status-uptime
-rwxr-xr-x 1 root wheel 611B Feb 19 03:01:00 2020 420.status-network
-rwxr-xr-x 1 root wheel 684B Feb 19 03:01:00 2020 410.status-mfi
-rwxr-xr-x 1 root wheel 590B Feb 19 03:01:00 2020 409.status-gconcat
-rwxr-xr-x 1 root wheel 590B Feb 19 03:01:00 2020 408.status-gstripe
-rwxr-xr-x 1 root wheel 591B Feb 19 03:01:00 2020 407.status-graid3
-rwxr-xr-x 1 root wheel 596B Feb 19 03:01:00 2020 406.status-gmirror
-rwxr-xr-x 1 root wheel 807B Feb 19 03:01:00 2020 404.status-zfs
-rwxr-xr-x 1 root wheel 583B Feb 19 03:01:00 2020 401.status-graid
-rwxr-xr-x 1 root wheel 773B Feb 19 03:01:00 2020 400.status-disks
-rwxr-xr-x 1 root wheel 724B Feb 19 03:01:00 2020 330.news
-r-xr-xr-x 1 root wheel 1.4K Feb 19 03:01:00 2020 310.accounting
-rwxr-xr-x 1 root wheel 693B Feb 19 03:01:00 2020 300.calendar
-rwxr-xr-x 1 root wheel 1.0K Feb 19 03:01:00 2020 210.backup-aliases
-rwxr-xr-x 1 root wheel 1.7K Feb 19 03:01:00 2020 200.backup-passwd
-rwxr-xr-x 1 root wheel 603B Feb 19 03:01:00 2020 150.clean-hoststat
-rwxr-xr-x 1 root wheel 1.0K Feb 19 03:01:00 2020 140.clean-rwho
-rwxr-xr-x 1 root wheel 709B Feb 19 03:01:00 2020 130.clean-msgs
-rwxr-xr-x 1 root wheel 1.1K Feb 19 03:01:00 2020 120.clean-preserve
-rwxr-xr-x 1 root wheel 1.5K Feb 19 03:01:00 2020 110.clean-tmps
-rwxr-xr-x 1 root wheel 1.3K Feb 19 03:01:00 2020 100.clean-disks
drwxr-xr-x 2 root wheel 1.0K Nov 1 07:06:41 2019 .
drwxr-xr-x 6 root wheel 512B Nov 1 07:06:41 2019 ..
but if the periodic framework executes the jobs serially I see no link with
440.status-mailq (that does not sound like high io) :S.
I think there definately is a link between this bug and high disk-/network-io
so the periodic framework probably classifies as a nice trigger (especially the
security bits with the find commands)....
We will continue to cross-reference the access times of the daily scripts with
the "Missing interrupt" occurences and post updates.
Kind regards,
Ruben
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-virtualization
mailing list