ZFS "stalls" -- and maybe we should be talking about defaults?

Jeremy Chadwick jdc at koitsu.org
Tue Mar 5 05:32:52 UTC 2013


On Tue, Mar 05, 2013 at 05:05:47AM +0000, Ben Morrow wrote:
> Quoth Karl Denninger <karl at denninger.net>:
> > 
> > Note that the machine is not booting from ZFS -- it is booting from and
> > has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks
> > like a single "da0" drive to the OS) and that drive stalls as well when
> > it freezes.  It's definitely a kernel thing when it happens as the OS
> > would otherwise not have locked (just I/O to the user partitions) -- but
> > it does. 
> 
> Is it still the case that mixing UFS and ZFS can cause problems, or were
> they all fixed? I remember a while ago (before the arc usage monitoring
> code was added) there were a number of reports of serious probles
> running an rsync from UFS to ZFS.

This problem still exists on stable/9.  The behaviour manifests itself
as fairly bad performance (I cannot remember if stalling or if just
throughput rates were awful).  I can only speculate as to what the root
cause is, but my guess is that it has something to do with the two
caching systems (UFS vs. ZFS ARC) fighting over large sums of memory.

The advice I've given people in the past is: if you do a LOT of I/O
between UFS and ZFS on the same box, it's time to move to 100% ZFS.
That said, I still do not recommend ZFS for a root filesystem (this
biting people still happens even today), and swap-on-ZFS is a huge
no-no.

I will note that I myself use pure UFS+SU (not SUJ) for my main OS
installation (that means /, swap, /var, /tmp, and /usr) on a dedicated
SSD, while everything else is ZFS raidz1 (no dedup, no compression;
won't ever enable these until that thread priority problem is fixed on
FreeBSD).

However, when I was migrating from gmirror+UFS+SU to ZFS, I witnessed
what I described in my 1st and 2nd paragraphs.  What userland utilities
were used (rsync vs. cp) made no difference; the problem is in the
kernel.

Footnote about this thread:

This thread contains all sorts of random pieces of information about
systems, with very little actual detail in them (barring the symptoms,
which are always useful to know!).

For example, just because your machine has 8 cores and 12GB of RAM
doesn't mean jack squat if some software in the kernel is designed
"oddly".  Reworded: throwing more hardware at a problem solves nothing.

The most useful thing (for me) that I found was deep within the thread,
a few words along the lines of "De-dup isn't used".  What about
compression, and if it's *ever* been enabled on the filesystem (even
if not presently enabled)?  It matters.  All this matters.

I see lots of end-users talking about these problems, but (barring
Steven) literally no "kernel people" who are "in the know" about ZFS
mentioning how said users can get them (devs) info that can help track
this down.  Those devs live on freebsd-fs@ and freebsd-hackers@, and not
too many read freebsd-stable at .

Step back for a moment and look at this anti-KISS configuration:

- Hardware RAID controller involved (Areca 1680ix)
- Hardware RAID controller has its own battery-backed cache (2GB)
- Therefore arcmsr(4) is involved -- revision of driver/OS build
  matters here, ditto with firmware version
- 4 disks are involved, models unknown
- Disks are GPT and are *partitioned, and ZFS refers to the partitions
  not the raw disk -- this matters (honest, it really does; the ZFS
  code handles things differently with raw disks)
- Providers are GELI-encrypted

Now ask yourself if any dev is really going to tackle this one given the
above mess.

My advice would be to get rid of the hardware RAID (go with Intel ICHxx
or ESBx on-board with AHCI), use raw disks for ZFS (if 4096-byte sector
disks use the gnop(8) method, which is a one-time thing), and get rid of
GELI.  If you can reproduce the problem there 100% of the time, awesome,
it's a clean/clear setup for someone to help investigate.

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |


More information about the freebsd-stable mailing list