ZFSKnownProblems - needs revision?
mandrews at bit0.com
Thu Apr 9 05:52:38 UTC 2009
Ivan Voras wrote:
> Ivan Voras wrote:
>> * Are the issues on the list still there?
>> * Are there any new issues?
>> * Is somebody running ZFS in production (non-trivial loads) with
>> success? What architecture / RAM / load / applications used?
>> * How is your memory load? (does it leave enough memory for other services)
> also: what configuration (RAIDZ, mirror, etc.?)
With 7.0 and 7.1 I had frequent livelocks with ZFS when it was unable to
get more memory from the kernel. Tuning kmem up to 768K helped but
didn't 100% eliminate the issues... still had to reboot systems every
few weeks. Since most were either in a redundant cluster or personal
machines, I kinda lived with it.
With the increase in default kmem size in 7.2, I removed the kmem_size
tuning and have not had a single ZFS-related hang since. The only
tuning I use is vfs.zfs.arc_max="100M" and disabling prefetch (but
leaving zil on) -- and I'm seriously considering throwing those two out
(or at least raising arc_max) and seeing what happens. I'm much happier
with how 7.2's ZFS behaves than 7.1's. It's definitely "getting
there"... with one catch (see below).
Anyone running an up-to-date STABLE -- on amd64, anyway -- should
consider removing kmem_size tweaks from their loader.conf... You don't
need 8.x for that, just 7.2-prerelease/beta. I don't know about i386,
I'd be a bit nervous about ZFS on i386 given how much memory it wants on
There is a significant issue with MySQL InnoDB logs getting corrupted if
the system ever does crash/lose power/etc, very reproducible on demand
on multiple 7.2/7.1/7.0 machines, but is not reproducible in HEAD and
its newer ZFS v13 (which is why I never opened a pr on it). For now any
MySQL masters I run must stay on UFS2, because of that showstopper...
if anyone wants to try to look at it, I can open a pr or send more
details, just ask.
I have seen one file get corrupted in a zpool, in two separate instances
(different machines each time), but was never able to reproduce it ever
again. Next time it happens I'll dig into it a bit more.
This is on about seven Core 2 Quad boxes with 8 GB of memory (some have
only 4) and amd64. Most disk IO is writing http logs, which can get
pretty busy on our webservers (usually hundreds of apache processes plus
nginx, previously lighttpd, serving a few thousand concurrent hits)...
plus some light maildir, some not-so-light rsync at night... Most are
simple mirrored pairs of SATA disks. A few are hardware raid10 (LSI
SAS, 3ware SAS) and ZFS is given just a single device... even though I
know that's not optimal (two hardware raid1's or jbod would be more
reliable)... those are personal boxes and not production though. I'm
not brave enough to attempt booting off of ZFS yet; I use a small
gmirrored UFS2 for /. I'm not swapping to ZFS either.
More information about the freebsd-stable