Help needed! ZFS I/O error recovery?

Pawel Jakub Dawidek pjd at FreeBSD.org
Sun Oct 4 17:48:00 UTC 2009


On Thu, Oct 01, 2009 at 11:05:03AM +0200, Solon Lutz wrote:
> Hi erverybody,
> 
> I'm faced with a 10TB ZFS pool on a 12TB RAID6 Areca controller.
> And yes, I know, you shouldn't put a zpool on a RAID-device... =(

Just to be sure: you have no redundancy on ZFS level at all? That's
very, very bad idea for important data (you know that already, but to
warn others)...

> The cable was replaced, a parity check was run on the RAID-Volume and
> showed no errors, the zfs scrub however showed some 'defective' files.
> After copying these files with 'dd -conv=noerror...' and comparing them
> to the originals, they were error-free.
> 
> Yesterday however, three more defective cables forced the controller
> to take the RAID6 volume offline. Now all cables were replaced and a parity
> check was run on the RAID-Volume -> data integrity OK.

This means absolutely nothing. It just means that parity match the
actual data, it doesn't mean the data is fine from file system or
application perspective.

> But now ZFS refuses to mount all volumes:
> 
> Solaris: WARNING: can't process intent log for temp/space1
> Solaris: WARNING: can't process intent log for temp/space2
> Solaris: WARNING: can't process intent log for temp/space3
> Solaris: WARNING: can't process intent log for temp/space4
> 
> A scrub revealed to following:
> 
> errors: Permanent errors have been detected in the following files:
> 
>         temp:<0x0>
>         temp/space1:<0x0>
>         temp/space2:<0x0>
>         temp/space3:<0x0>
>         temp/space4:<0x0>
> 
> 
> I tried to switch off checksums for this pool, but that didn't help in any
> way. I also mounted the pool by hand and was faced with with 'empty' volumes
> and 'I/O errors' when trying to list their contents...
> 
> Any suggestions? I'm offering some self-made blackberry jam and raspberry brandy
> to the person who can help to restore or backup the data.
> 
> Tech specs:
> 
> FreeBSD 7.2-STABLE #21: Tue May  5 18:44:10 CEST 2009 (AMD64)
> da0 at arcmsr0 bus 0 target 0 lun 0
> da0: <Areca ARC-1280-VOL#00 R001> Fixed Direct Access SCSI-5 device
> da0: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit)
> da0: Command Queueing Enabled
> da0: 10490414MB (21484367872 512 byte sectors: 255H 63S/T 1337340C)
> ZFS filesystem version 6
> ZFS storage pool version 6

If you are able to backup your disks, do it before we go further. I've
some ideas, but they can mess up your data even further.

First of all I'd start with upgrading system to stable/8, there could be
better error recovery.

Do not write anything new to the pool, actually do not even read from it
as it may trigger writting as well.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20091004/7f98cd66/attachment.pgp


More information about the freebsd-fs mailing list