gvinum raid5 vs. ZFS raidz

Sat Aug 30 01:48:55 UTC 2014

Paul Kraus <paul at kraus-haus.org> wrote:

> On Aug 28, 2014, at 2:36, Scott Bennett <bennett at sdf.org> wrote:
>
> > Paul Kraus <paul at kraus-haus.org> wrote:
>
> >> Wow. That implies you are hitting a drive with a very high uncorrectable error rate since the drive did not report any errors and the data is corrupt. I have yet to run into one of those.
> > 
> >     How would an uncorrectable error be detected by the drive without any
> > parity checking or hardware-implemented write-with-verify?
>
> I suppose my point was that an operation that is NOT flagged by the drive as failing and DOES return faulty data is, by definition, an uncorrectable error (as far as the drive is concerned). The point is that an uncorrectable error (from the drive standpoint) is just that, an error that the drive CANNOT detect.

     Maybe this is just a linguistics/semantics matter, but it seems to me
that an operation that is not flagged by the drive as an error but does return
faulty data is, by definition, not an error at all as far as the drive is
concerned, but it *is* an error as far as a human is concerned.  The difference
in wording is why I asked the question, so at least the confusion is now
cleared up. :-)
>
> >     Are you using any drives larger than 1 TB?
>
> I have been testing with a bunch of 2TB (3 HGST and 1 WD). I have been using ZFS and it has not reported *any* checksum errors.
>
     What sort of testing?  Unless the data written with errors are read back,
how would ZFS know about any checksum errors?  Does ZFS implement write-with-
verify?  Copying some humongous file and then reading it back for comparison
(or, with ZFS, just reading them) ought to bring the checksums into play.  Of
course, a scrub should do that, too.
     I have never bought the enterprise-grade drives--though I may begin doing
so after having read the information you've brought up here--so the difference
in drive quality at the outset may explain why your results so far have been
so much better than mine.

> I have put one of the 4 into production service (I needed a replacement for a failed 1TB and did not have any more 1TB in stock). It has been running for a couple weeks now with no checksum errors reported. My zpool is 5 x 1TB RAIDz2 and it has about 2TB of data on it right now.
>
> >  If so, try copying a 1.1 TB
> > file to one of them, and then trying comparing the copy against the original.
>
> Hurmmm. I have not worked with individual files that large. What filesystem are you using here? 

     At the moment, all of my file systems on hard drives are UFS2.
>
> > Out of the three drives I could test that way, I got that kind of result on
> > two every time I tried it.  One of the two was a new Samsung (i.e., a
> > Seagate), and the other was a refurbished Seagate supplied as a replacement
> > under warranty.  The third got a clean copy the first time and two bytes with
> > single-bit errors on the second try.  That one was also a refurbished Seagate
> > provided under warranty.
>
> If you use ZFS on these drives and copy the same file do you get any checksum errors?
>
     As soon as I can get two more 2 TB drives and set them up under ZFS,
I intend to try the equivalent of that.  Because drives cannot be added to
an existing raidzN, I need to wait until then to create the 6-drive raidz2.
However, that original 1.1 TB file is currently sitting on one of the four
drives I already have that are intended for the raidz2, so that file will
be trashed by creating the raidz2.  The file is a dump(8) file of a 1.2 TB
file system that is nearly full, so I can run the dump again with the output
going to the newly created pool, after which I can try a
"dd if=dumpfile of=/dev/null" to see whether ZFS detects any problems.  If
it doesn't, then I can try a scrub on the pool to see whether that finds any
problems.
     My expectation is that I will end up contacting one or more manufacturers
to try to replace at least two drives based on whatever ZFS detects, but I
would be glad to be mistaken about that for now.  If two are that bad, then
I hope that ZFS can keep things running until the replacements show up here.

                                  Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************