FreeBSD + ZFS on a production server?
    Oliver Fromme 
    olli at lurza.secnetix.de
       
    Wed Jun 11 19:29:13 UTC 2008
    
    
  
[attribution fixed]
Wojciech Puchar wrote:
 > Oliver Fromme wrote:
 > > Wojciech Puchar wrote:
 > > > 3) a CPU,cache and memory bandwidth hogging "feature" of checksumming all
 > > > blocks. thing that are already done in disk hardware. fortunately you can
 > > > turn this off
 > > 
 > > Obviously you have been lucky to never be a victim of
 > > silent disk corruption (or you just haven't noticed).
 > 
 > what you mean. that disk wrote the data wrong and doesn't detect it on 
 > read? i would mean broken disk processor, it's memory etc.
Correct.  It does happen.
 > possible - as much as broken main processor, main memory, some of chips on 
 > motherboard etc. -
A broken processor usually results in random crashes, not
silent data corruption.  Broken memory will be noticed if
it supports ECC, otherwise it will also result in crashes,
most probably.
 > which will make ZFS calculate checksum wrong on write, 
Even if that happens (without crashes or other things that
you'll notice immediately), the error will be detected by
ZFS and fixed ("healed") if possible, i.e. when running
with redundancy and at least one copy has a good checksum.
(GELI can only detect, but not fix.  ZFS can fix it, too.
I assume in theory it would be possible to make geli co-
operate with gmirror so it could fix bad blocks, too, but
that's just theory.  ZFS is reality.)
 > or even calculate checksum right of wrong data generated by badly 
 > operating programs.
What do you mean, wrong data generated by programs?  If
a program generates wrong output, there's nothing any
file system could do about that.  That's not the file
system's job at all.  The file systems job is to ensure
the integrity of data written to the disk, and ZFS does
exactly that.
 > given the complexity of motherboard+CPU etc. to complexity of disk 
 > hardware, i don't think "silent disk failure" happens often.
Fortunately it doesn't happen often, but it does happen.
And when it happens, you are in really serious trouble.
You usually notice it when it's too late and the last
good backup media was already recycled.
 > i think all your cases wasn't disk, but general hardware problems.
In my case it was a disk with media surface errors, and
the disk failed to report the error properly to the OS.
Instead it just returned bad data.
 > ZFS may help detect it, or it may not. if it helped for you.
Please stop spreading FUD.  There is no "may or may not".
If a disk returns bad data, ZFS _will_ detect it.
Silent corruption _cannot_ happen with ZFS, except if
you disable the checksumming feature intentionally.
 > even without ZFS it WOULD cause problems with programs like random 
 > crashes.
Please elaborate what the problem is, if you think there
is one.
 > personally i often got disk failing the way that it was unable to read or 
 > write giving an error, but never things like that.
As I said:  You were lucky.
 > > You're free to use UFS, of course, and keep suffering
 > > from its shortcomings.
 > 
 > i have to start suffering at first....
Many people suffer without knowing.  :-)
I do suffer from UFS' shortcomings on many machines
on which I can't use ZFS (or other file systems) for
various reasons.
Best regards
   Oliver
-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart
FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd
With Perl you can manipulate text, interact with programs, talk over
networks, drive Web pages, perform arbitrary precision arithmetic,
and write programs that look like Snoopy swearing.
    
    
More information about the freebsd-questions
mailing list