[Bug 215635] LOR in zfs

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed Dec 28 16:29:26 UTC 2016


--- Comment #1 from Dan Langille <dvl at FreeBSD.org> ---
In case it's relevant:

The server has 3x LSI SAS 2008 cards.

It boots off zfs, a 10-drive raidz2.

There is a second 10-drive raidz3.

8 drives from one zpool are all on one HBA.

8 drives from the other zpool are on another HBA.

Two drives from each zpool are on a third HBA (i.e. that HBA has two drives
from one pool and two drives from the other).

The zroot zpool has been around for several years, but was recently moved from
one box to another, getting a new M/B & new HBAs.  Shortly afterwards, the
raidz3 was added to the box. Since that addition, the system has frozen most
nights at about 0301, more or less.

The host has 17 jails, all of which have stock /etc/crontab entries (i.e. daily
periodic runs).

By freeze, I mean:

- ssh to the server fails to connect
- login at console accepts password but does not present prompt, but responds
to CTL-T (https://twitter.com/DLangille/status/804738019857199105)
- Backups jobs which used this server as the destination, continued to work.
- nagios flips out over the services provided by the jails/host
- postfix stops responding (both host and jails)
- tail -F /var/log/messages : in an existing ssh session continued to stream


* https://twitter.com/DLangille/media (lots of screen shots0
* https://twitter.com/search?q=%23frozenserver&src=typd
* http://dan.langille.org/2016/12/14/server-freeze-2014-12-14/

