[Bug 235856] FreeBSD freezes on AWS EC2 t3 machines

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed Feb 19 13:20:37 UTC 2020


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235856

--- Comment #23 from mail at rubenvos.com ---
(In reply to Colin Percival from comment #21)

Hmm. The difference of exactly 1 hour doesn't seem to have a relationship with
different timezones...

Today we had another occurance on one of the machines:

Feb 19 03:24:53 volume3 kernel: nvme1: cpl does not map to outstanding cmd
Feb 19 03:24:53 volume3 kernel: cdw0:00000000 sqhd:000c sqid:0002 cid:0017 p:0
sc:00 sct:0 m:0 dnr:0
Feb 19 03:24:53 volume3 kernel: nvme1: Missing interrupt
Feb 19 03:24:53 volume3 kernel: nvme1: Resetting controller due to a timeout.
Feb 19 03:24:53 volume3 kernel: nvme1: resetting controller
Feb 19 03:24:54 volume3 kernel: nvme1: temperature threshold not supported
Feb 19 03:24:54 volume3 kernel: nvme1: aborting outstanding i/o

Comparing 03:24:53 with the access times of the daily scripts though:

ls -lahtuT /etc/periodic/daily/
total 128
-rwxr-xr-x  1 root  wheel   1.0K Feb 19 12:31:47 2020 450.status-security
-rwxr-xr-x  1 root  wheel   811B Feb 19 05:32:14 2020 999.local
-rwxr-xr-x  1 root  wheel   2.8K Feb 19 05:32:14 2020 800.scrub-zfs
-rwxr-xr-x  1 root  wheel   845B Feb 19 05:32:14 2020 510.status-world-kernel
-rwxr-xr-x  1 root  wheel   737B Feb 19 05:32:14 2020 500.queuerun
-rwxr-xr-x  1 root  wheel   498B Feb 19 05:32:14 2020 480.status-ntpd
-rwxr-xr-x  1 root  wheel   451B Feb 19 05:32:14 2020 480.leapfile-ntpd
-rwxr-xr-x  1 root  wheel   2.0K Feb 19 05:32:14 2020 460.status-mail-rejects
-rwxr-xr-x  1 root  wheel   1.4K Feb 19 03:01:00 2020 440.status-mailq
-rwxr-xr-x  1 root  wheel   705B Feb 19 03:01:00 2020 430.status-uptime
-rwxr-xr-x  1 root  wheel   611B Feb 19 03:01:00 2020 420.status-network
-rwxr-xr-x  1 root  wheel   684B Feb 19 03:01:00 2020 410.status-mfi
-rwxr-xr-x  1 root  wheel   590B Feb 19 03:01:00 2020 409.status-gconcat
-rwxr-xr-x  1 root  wheel   590B Feb 19 03:01:00 2020 408.status-gstripe
-rwxr-xr-x  1 root  wheel   591B Feb 19 03:01:00 2020 407.status-graid3
-rwxr-xr-x  1 root  wheel   596B Feb 19 03:01:00 2020 406.status-gmirror
-rwxr-xr-x  1 root  wheel   807B Feb 19 03:01:00 2020 404.status-zfs
-rwxr-xr-x  1 root  wheel   583B Feb 19 03:01:00 2020 401.status-graid
-rwxr-xr-x  1 root  wheel   773B Feb 19 03:01:00 2020 400.status-disks
-rwxr-xr-x  1 root  wheel   724B Feb 19 03:01:00 2020 330.news
-r-xr-xr-x  1 root  wheel   1.4K Feb 19 03:01:00 2020 310.accounting
-rwxr-xr-x  1 root  wheel   693B Feb 19 03:01:00 2020 300.calendar
-rwxr-xr-x  1 root  wheel   1.0K Feb 19 03:01:00 2020 210.backup-aliases
-rwxr-xr-x  1 root  wheel   1.7K Feb 19 03:01:00 2020 200.backup-passwd
-rwxr-xr-x  1 root  wheel   603B Feb 19 03:01:00 2020 150.clean-hoststat
-rwxr-xr-x  1 root  wheel   1.0K Feb 19 03:01:00 2020 140.clean-rwho
-rwxr-xr-x  1 root  wheel   709B Feb 19 03:01:00 2020 130.clean-msgs
-rwxr-xr-x  1 root  wheel   1.1K Feb 19 03:01:00 2020 120.clean-preserve
-rwxr-xr-x  1 root  wheel   1.5K Feb 19 03:01:00 2020 110.clean-tmps
-rwxr-xr-x  1 root  wheel   1.3K Feb 19 03:01:00 2020 100.clean-disks
drwxr-xr-x  2 root  wheel   1.0K Nov  1 07:06:41 2019 .
drwxr-xr-x  6 root  wheel   512B Nov  1 07:06:41 2019 ..

but if the periodic framework executes the jobs serially I see no link with 
440.status-mailq (that does not sound like high io) :S.

I think there definately is a link between this bug and high disk-/network-io
so the periodic framework probably classifies as a nice trigger (especially the
security bits with the find commands)....

We will continue to cross-reference the access times of the daily scripts with
the "Missing interrupt" occurences and post updates.

Kind regards,

Ruben

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-virtualization mailing list