This diskfailure should not panic a system, but just disconnect disk from ZFS
Willem Jan Withagen
wjw at digiware.nl
Mon Jun 22 01:10:26 UTC 2015
On 21/06/2015 21:50, Tom Curry wrote:
> Was there by chance a lot of disk activity going on when this occurred?
Define 'a lot'??
But very likely, since the system is also a backup location for several
external service which backup thru rsync. And they can generate generate
quite some traffic. Next to the fact that it also serves a NVR with a
ZVOL trhu iSCSI...
--WjW
>
> On Sun, Jun 21, 2015 at 10:00 AM, Willem Jan Withagen <wjw at digiware.nl
> <mailto:wjw at digiware.nl>> wrote:
>
> On 20/06/2015 18:11, Daryl Richards wrote:
> > Check the failmode setting on your pool. From man zpool:
> >
> > failmode=wait | continue | panic
> >
> > Controls the system behavior in the event of catastrophic
> > pool failure. This condition is typically a
> > result of a loss of connectivity to the underlying storage
> > device(s) or a failure of all devices within
> > the pool. The behavior of such an event is determined as
> > follows:
> >
> > wait Blocks all I/O access until the device
> > connectivity is recovered and the errors are cleared.
> > This is the default behavior.
> >
> > continue Returns EIO to any new write I/O requests but
> > allows reads to any of the remaining healthy
> > devices. Any write requests that have yet to be
> > committed to disk would be blocked.
> >
> > panic Prints out a message to the console and generates
> > a system crash dump.
>
> 'mmm
>
> Did not know about this setting. Nice one, but alas my current
> setting is:
> zfsboot failmode wait default
> zfsraid failmode wait default
>
> So either the setting is not working, or something else is up?
> Is waiting only meant to wait a limited time? And then panic anyways?
>
> But then still I wonder why even in the 'continue'-case the ZFS system
> ends in a state where the filesystem is not able to continue in its
> standard functioning ( read and write ) and disconnects the disk???
>
> All failmode settings result in a seriously handicapped system...
> On a raidz2 system I would perhaps expected this to occur when the
> second disk goes into thin space??
>
> The other question is: The man page talks about
> 'Controls the system behavior in the event of catastrophic pool failure'
> And is a hung disk a 'catastrophic pool failure'?
>
> Still very puzzled?
>
> --WjW
>
> >
> >
> > On 2015-06-20 10:19 AM, Willem Jan Withagen wrote:
> >> Hi,
> >>
> >> Found my system rebooted this morning:
> >>
> >> Jun 20 05:28:33 zfs kernel: sonewconn: pcb 0xfffff8011b6da498: Listen
> >> queue overflow: 8 already in queue awaiting acceptance (48
> occurrences)
> >> Jun 20 05:28:33 zfs kernel: panic: I/O to pool 'zfsraid' appears
> to be
> >> hung on vdev guid 18180224580327100979 at '/dev/da0'.
> >> Jun 20 05:28:33 zfs kernel: cpuid = 0
> >> Jun 20 05:28:33 zfs kernel: Uptime: 8d9h7m9s
> >> Jun 20 05:28:33 zfs kernel: Dumping 6445 out of 8174
> >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> >>
> >> Which leads me to believe that /dev/da0 went out on vacation, leaving
> >> ZFS into trouble.... But the array is:
> >> ----
> >> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP
> >> zfsraid 32.5T 13.3T 19.2T - 7% 41% 1.00x
> >> ONLINE -
> >> raidz2 16.2T 6.67T 9.58T - 8% 41%
> >> da0 - - - - - -
> >> da1 - - - - - -
> >> da2 - - - - - -
> >> da3 - - - - - -
> >> da4 - - - - - -
> >> da5 - - - - - -
> >> raidz2 16.2T 6.67T 9.58T - 7% 41%
> >> da6 - - - - - -
> >> da7 - - - - - -
> >> ada4 - - - - - -
> >> ada5 - - - - - -
> >> ada6 - - - - - -
> >> ada7 - - - - - -
> >> mirror 504M 1.73M 502M - 39% 0%
> >> gpt/log0 - - - - - -
> >> gpt/log1 - - - - - -
> >> cache - - - - - -
> >> gpt/raidcache0 109G 1.34G 107G - 0% 1%
> >> gpt/raidcache1 109G 787M 108G - 0% 0%
> >> ----
> >>
> >> And thus I'd would have expected that ZFS would disconnect
> /dev/da0 and
> >> then switch to DEGRADED state and continue, letting the operator
> fix the
> >> broken disk.
> >> Instead it chooses to panic, which is not a nice thing to do. :)
> >>
> >> Or do I have to high hopes of ZFS?
> >>
> >> Next question to answer is why this WD RED on:
> >>
> >> arcmsr0 at pci0:7:14:0: class=0x010400 card=0x112017d3
> chip=0x112017d3
> >> rev=0x00 hdr=0x00
> >> vendor = 'Areca Technology Corp.'
> >> device = 'ARC-1120 8-Port PCI-X to SATA RAID Controller'
> >> class = mass storage
> >> subclass = RAID
> >>
> >> got hung, and nothing for this shows in SMART....
>
> _______________________________________________
> freebsd-fs at freebsd.org <mailto:freebsd-fs at freebsd.org> mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org
> <mailto:freebsd-fs-unsubscribe at freebsd.org>"
>
>
More information about the freebsd-fs
mailing list