zfs i/o error, no driver error

Mon Jun 7 12:19:56 UTC 2010

On Mon, Jun 07, 2010 at 12:12:16PM +0100, Martin Simmons wrote:
> >>>>> On Mon, 7 Jun 2010 02:08:50 -0700, Jeremy Chadwick said:
> > 
> > I'm still trying to figure out why people do this.
> 
> Maybe because the ZFS Best Practices Guide suggests it?  ("Run zpool scrub on
> a regular basis to identify data integrity problems...")
> 
> It makes sense to detect errors when there is still a healthy mirror, rather
> than waiting until two drives are failing :-)

The official quote from the ZFS Best Practices Guide[1] is:

"Run zpool scrub on a regular basis to identify data integrity problems.
If you have consumer-quality drives, consider a weekly scrubbing
schedule. If you have datacenter-quality drives, consider a monthly
scrubbing schedule."

The first line of the paragraph seems reasonable; the concept being, do
this process often so that you catch potential data-threatening errors
before your entire pool explodes.  Cool, I can accept that, but it gets
us into a discussion about how often this is necessary (keep reading for
more on that).  However, the second part of the paragraph -- total
rubbish.  "Datacenter-quality drives?"  Oh, I think they mean
"enterprise-grade drives", which really don't offer much more than
high-end consumer-grade drives at this point in time[2].  One of the key
points of ZFS's creation was to provide a reliable filesystem using
cheap disks[3][4].

The only thing I can find in the ZFS Administration Guide[5] is this:

"The simplest way to check your data integrity is to initiate an
explicit scrubbing of all data within the pool. This operation traverses
all the data in the pool once and verifies that all blocks can be read.
Scrubbing proceeds as fast as the devices allow, though the priority of
any I/O remains below that of normal operations. This operation might
negatively impact performance, though the file system should remain
usable and nearly as responsive while the scrubbing occurs."

"Performing routine scrubbing also guarantees continuous I/O to all
disks on the system. Routine scrubbing has the side effect of preventing
power management from placing idle disks in low-power mode. If the
system is generally performing I/O all the time, or if power consumption
is not a concern, then this issue can safely be ignored."

What's confusing about this is the phrase that pool verification is done
by "verifying all the blocks can be read".  Doesn't that happen when a
standard read operation comes down the pipe for a file?  What I'm
getting at is that there's no explanation (that I can find) which states
why scrubbing regularly "ensures" anything, other than allowing a person
to see an error sooner than later.

Which brings us to the topic of scrub interval...

This exact question was asked on the ZFS OpenSolaris list[6] in late
2008, and nobody there provided any concrete evidence either.  The
closest thing to evidence is this:

"...in normal operation, ZFS only checks data as it's read back from the
disks.  If you don't periodically scrub, errors that happen over time
won't be caught until I next read that actual data, which might be
inconvenient if it's a long since the initial data was written".

The topic of scrub intervals was also brought up a month later[7].
Someone said:

"We did a study on re-write scrubs which showed that once per year was a
good interval for modern, enterprise-class disks.  However, ZFS does a
read-only scrub, so you might want to scrub more often".

The first part conflicts with what the guide recommends (I'd also like
to see the results of the study!), while the last half of the paragraph
makes no sense ("because it reads, do it more often!").  So if you take
the first sentence and apply it to what the ZFS Best Practices Guide
says, you come out with... "scrub consumer-grade disks every 6 months".

In the same thread, we have this quote from a different person:

"Even that is probably more frequent than necessary. I'm sure somebody  
has done the MTTDL math. IIRC, the big win is doing any scrubbing at  
all. The difference between scrubbing every 2 weeks and every 2  
months may be negligible. (IANAMathematician tho)"

So the justification seems, well, unjustified.  It's almost as if
because the filesystem is new, that there's an underlying sense of
paranoia, so everyone scrubs often.  I understand the "pre-emptive"
argument, just not the technical argument.

So how often do *I* scrub our pools?  Rarely.  I tend to look at SMART
stats much more aggressively; "uh oh, uncorrected sector, better
scrub..."  Or if while using the system it feels sluggish on I/O, or
cronjob tasks taking way longer than need be.

> > It's important to remember that scrubs are *highly* intensive on both
> > the system itself as well as on all pool members.  Disk I/O activity is
> > very heavy during a scrub; it's not considered "normal use".
> 
> Is it worse that a full backup?  I guess scrub does read all drives, but OTOH
> backup will typically read all data non-linearly, which adds a different kind
> of stress.

I'd guess it'd depend greatly on the type of backup.  I'd imagine that a
ZFS snapshot (non-incremental) + zfs send would be less intensive than a
scrub, and the same (but even more so) with an incremental snapshot.
I'd imagine rsync/tar/cp/etc. would be somewhere in-between.

I don't use ZFS snapshots because I don't know if they've stabilised on
FreeBSD.

[1]: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pools
[2]: http://lists.freebsd.org/pipermail/freebsd-fs/2010-May/008508.html
[3]: http://blogs.sun.com/bonwick/entry/zfs_end_to_end_data
[4]: http://Fwww.sun.com/software/solaris/zfs_lc_preso.pdf
[5]: http://docs.sun.com/app/docs/doc/819-5461/gbbwa?l=en&a=view
[6]: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg20995.html
[7]: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg21728.html
[8]: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSPeriodicScrubbing

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |