ZFS I/O errors
Olaf Seibert
O.Seibert at cs.ru.nl
Tue May 31 09:26:02 UTC 2011
On Mon 30 May 2011 at 12:19:10 -0500, Dan Nelson wrote:
> The ZFS compression code will panic if it can't allocate the buffer needed
> to store the compressed data, so that's unlikely to be your problem. The
> only time I have seen an "illegal byte sequence" error was when trying to
> copy raw disk images containing ZFS pools to different disks, and the
> destination disk was a different size than the original. I wasn't even able
> to import the pool in that case, though.
Yet somehow some incorrect data got written, it seems. That never
happened before, fortunately, even though we had crashes before that
seemed to be related to ZFS running out of memory.
> The zfs IO code overloads the EILSEQ error code and uses it as a "checksum
> error" code. Returning that error for the same block on all disks is
> definitely weird. Could you have run a partitioning tool, or some other
> program that would have done direct writes to all of your component disks?
I hope I would remember doing that if I did!
> Your scrub is also a bit worrying - 24k checksum errors definitely shouldn't
> occur during normal usage.
It turns out that the errors are easy to provoke: they happen every time
I do an ls of of the affected directories. There were processes running
that were likely to be trying to write to the same directories (the file
system is exported over NFS), so in that case it is easy to imagine that
the numbers rack up quickly.
I moved those directories to the side, for the moment, but I haven't
been able to delete them yet. The data is a bit bigger than we're able
to backup so "just restoring a backup" isn't an easy thing to do.
Possibly I could make a new filesystem in the same pool, if that would
do the trick; it isn't more than 50% full but the affected one is the
biggest filesystem in it.
The end result of the scrub is as follows:
pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub completed after 12h56m with 3 errors on Mon May 30 23:56:47 2011
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 6.38K
raidz2 ONLINE 0 0 25.4K
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
da4 ONLINE 0 0 0
da5 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
tank/vol-fourquid-1:<0x0>
tank/vol-fourquid-1 at saturday:<0x0>
/tank/vol-fourquid-1/.zfs/snapshot/saturday/backups/dumps/dump_usr_friday.dump
/tank/vol-fourquid-1/.zfs/snapshot/saturday/sverberne/CLEF-IP11/parts_abs+desc
/tank/vol-fourquid-1/.zfs/snapshot/sunday/sverberne/CLEF-IP11/parts_abs+desc
/tank/vol-fourquid-1/.zfs/snapshot/monday/sverberne/CLEF-IP11/parts_abs+desc
-Olaf.
--
Pipe rene = new PipePicture(); assert(Not rene.GetType().Equals(Pipe));
More information about the freebsd-stable
mailing list