zfs mirrors and high availability
Michael Boers
michaelscotttech at gmail.com
Thu Nov 11 18:35:18 UTC 2010
I am running a 100% zfs based FreeBSD 8.0 system with 4 disks: two zfs
mirrored boot drives and two zfs mirrored data drives. This morning
the server went down with the following errors in the log file:
Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SYNCHRONIZE
CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0
Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): CAM Status: SCSI
Status Error
Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SCSI Status: Check
Condition
Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): ABORTED COMMAND asc:
0,0
Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): No additional sense
information
Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): Retries Exhausted
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c87a0:2838
timed out for ccb 0xffffff0103acc000 (req->ccb 0xffffff0103acc000)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c5110:2839
timed out for ccb 0xffffff035cab0800 (req->ccb 0xffffff035cab0800)
Nov 11 10:05:53 caprica kernel: mpt0: attempting to abort req
0xffffff80003c87a0:2838 function 0
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bef30:2840
timed out for ccb 0xffffff0007986800 (req->ccb 0xffffff0007986800)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003c8560:2841
timed out for ccb 0xffffff032d985000 (req->ccb 0xffffff032d985000)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bf320:2842
timed out for ccb 0xffffff0103af2000 (req->ccb 0xffffff0103af2000)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003cbda0:2843
timed out for ccb 0xffffff0103b0b000 (req->ccb 0xffffff0103b0b000)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003bfd40:2844
timed out for ccb 0xffffff00102bf800 (req->ccb 0xffffff00102bf800)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003cad50:2845
timed out for ccb 0xffffff01e6f33000 (req->ccb 0xffffff01e6f33000)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003caf00:2846
timed out for ccb 0xffffff01e6f24800 (req->ccb 0xffffff01e6f24800)
Nov 11 10:05:53 caprica kernel: mpt0: request 0xffffff80003ccd60:2847
timed out for ccb 0xffffff01308a4000 (req->ccb 0xffffff01308a4000)
Why didn't zfs stop talking to the disk that was clearly having
issues? Are there sysctl or other variables that I can set that will
allow zfs to mark a disk as failed more aggressively? Is there a way
that I could have prevented the crash?
The system was "up", pingable, but not accessible via ssh. My guess
is that all disk related requests were queueing/stuck.
A few more notes on my setup:
Harware: Dell PowerEdge 2970, 1 CPU, 16 GB Ram
pool: Storage
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
Storage ONLINE 0 0 0
mirror ONLINE 0 0 0
da1 ONLINE 0 0 0
da3 ONLINE 0 0 0
errors: No known data errors
pool: zboot
state: ONLINE
scrub: scrub in progress for 0h22m, 72.03% done, 0h8m to go
config:
NAME STATE READ WRITE CKSUM
zboot ONLINE 0 0 0
mirror ONLINE 0 0 0
gpt/disk0 ONLINE 0 0 0
gpt/disk1 ONLINE 0 0 0
--
Thanks!
More information about the freebsd-questions
mailing list