FreeBSD + ZFS on a production server?

Oliver Fromme olli at lurza.secnetix.de
Mon Jun 16 15:37:19 UTC 2008


Just a small hint:  You should configure your MUA to
produce proper attribution lines.

Wojciech Puchar wrote:
 > Oliver Fromme wrote:
 > > A broken processor usually results in random crashes, not
 > > silent data corruption.
 > 
 > result in both in my practice. with broken companion chips (chipset) it's 
 > silent data corruption is common, while crashes can be under specific 
 > cases. that's from what i've got.

I've never had a broken processor that did not result in
crashes, but maybe I've been just lucky.  :-)

 > > > or even calculate checksum right of wrong data generated by badly
 > > > operating programs.
 > > 
 > > What do you mean, wrong data generated by programs?  If
 > 
 > wrong data generated by program because of hardware problem.

In that case the input to the program would have to be bad
already.  A broken disk (or controller) doesn't cause a
program to produce wrong output, unless it feeds bad input
to the program.  And ZFS would catch that.

 > > You usually notice it when it's too late and the last
 > > good backup media was already recycled.
 > 
 > not that bad, but of course - i make backups.

But you don't keep every backup forever, do you?  (I.e.
it would rather be an archive instead of a backup.  That
would cost a lot of space.)

 > > In my case it was a disk with media surface errors, and
 > > the disk failed to report the error properly to the OS.
 > > Instead it just returned bad data.
 > 
 > so i am just happy to never having it, while normal disk failures are 
 > quite common..

Yes, fortunately "normal" disk failures (i.e. reported to
the OS so they are clearly noticed) are more common than
silent corruption.

 > > > ZFS may help detect it, or it may not. if it helped for you.
 > > 
 > > Please stop spreading FUD.  There is no "may or may not".
 > > If a disk returns bad data, ZFS _will_ detect it.
 > 
 > please read more carefully. i didn't say it.

You did.  I quoted it.

 > i just say that "disk returning bad data" is very rare case,

Yes, fortunately it is rare.  But it does happen.  And when
it happens, ou are in very serious trouble.

For example, on the -stable list Goran Lowkrantz reported
on Saturday a corruption on one of his file systems due to
a flipped bit in a directory node.  He didn't use ZFS, but
was lucky to notice the problem because of strange size
entries in that directory.  He had to use fsdb(8) surgery
to fix it.  Personally I would recommend to not use that
disk anymore, because you never know in what other files
bits could be flipped, without you noticing so easily.
Well, or use ZFS on that disk -- then you're guaranteed
to notice.

 > lots of 
 > other - more frequent - hardware problems will not be detected.

That's speculative.  Personally I don't think so.

 > if you like to give lots of CPU power and disk bandwidth for calculation 

You're spreading FUD again.  The cpu time required for
generating and verifying the checksums is very low, and
the disk bandwidth is almost zero.

 > i just say it doesn't make lot of protection against bad hardware, not 
 > worth the expense.

Well, if the integrity of your files isn't important to
you ...

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"That's what I love about GUIs: They make simple tasks easier,
and complex tasks impossible."
        -- John William Chambless


More information about the freebsd-questions mailing list