ZFS performance as the FS fills up?

Wed Mar 9 12:51:42 UTC 2011

On Wed, Mar 09, 2011 at 11:56:49AM +0100, Matthias Andree wrote:
> Am 08.03.2011 12:48, schrieb Jeremy Chadwick:
> > On Tue, Mar 08, 2011 at 12:26:49PM +0100, Patrick M. Hausen wrote:
> >> we use a big JBOD and ZFS with raidz2 as the target
> >> for our nightly Amanda backups.
> >>
> >> I already suspected that the fact that the FS was > 90% full might
> >> be the cause of our backup performance continously decreasing.
> >>
> >> I just added another vdev - 6 disks of 750 GB each, raidz2 and the
> >> FS usage is back to 71% currently. This was while backups were
> >> running and write performance instantly skyrocketed compared to
> >> the values before.
> >>
> >> So, is it possible to name a reasonable amount of free space to
> >> keep on a raidz2 volume? On last year's EuroBSDCon I got
> >> the impression that with recent (RELENG_8) ZFS merges
> >> I could get away with using around 90%.
> > 
> > I'm in no way attempting to dissuade you from your efforts to figure out
> > a good number for utilisation, but when I hear of disks -- no matter how
> > many -- being 90% full, I immediately conclude performance is going to
> > suck simply because the outer "tracks" on a disk contains more sectors
> > than the inner "tracks".  This is the reason for performance degradation
> > as the seek offset increases, resulting in graphs like this:
> 
> Whatever.  I've experienced similar massive performance decrease even on
> a non-redundant single-disk ZFS setup after the ZFS (8-STABLE between
> 8.0 and before 8.2) had filled up once.
> 
> Even clearing the disk down to 70% didn't get my /usr (which was a ZFS
> mount) snappy again.  The speed decrease was one to two orders of
> magnitude in excess of what you'd attribute to the CLV or
> sectors-per-track change across the disk.
> 
> What I heard from my 7200/min WD RE3 drive (which seeks rather fast for
> a 7200/min drive - I think it was the fastest seeking 7200/min drive
> when I bought it) it was seeking and thrashing heads like hell even on
> single-threaded bulk reads of large files, and I suppose there was
> fragmentation and/or non-caching of metadata afoot, and it was far worse
> than any decrease in constant linear velocity or sectors-per-track of
> the disk tracks could explain, and the relevant ZFS ARC related options
> didn't rectify that either, so I reverted to GJOURNAL-enabled UFS which
> gave me a much better performance on a 5400/min disk than I've ever had
> with a halfway filled ZFS on the 7200/min RAID-class disk.  And bulk
> transfer rates of both drives are beyond any doubt.
> 
> In other words, the file system didn't recover speed (I'm not sure if
> that's a zfs or zpool feature), and I attribute that (and the failure to
> rm files from a 100% full file system) to the write-ahead-logging
> behaviour of ZFS.
> 
> Any comments?

FWIW: we have ZFS filesystems at my workplace, using Solaris 10, which
fill to 95-99% quite often (workhorse database servers).  Userland apps
bail out when 100% is reached, of course.  Once some space is freed up,
things are back to normal (our disk I/O graphs indicate there's nothing
different before or after the fs fills/free space is restored).  The
filesystems are single-disk pools, and in some other cases 3-disk raidz1
pools.

FreeBSD is not Solaris, true.  But possibly the issue has been addressed
with the ZFS v15 commit that happened between 8.1 and 8.2?  Here's the
said commit (I just picked one of the files during the MFC; the commit
message was the same across them all).  One would need to review each of
the individual OpenSolaris bugfix numbers to find out if any of them are
relevant:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#rev1.8.2.6

Otherwise, I can imagine that prefetching could cause what you describe,
which is enabled by default in 8.0 and 8.1 and auto-disables in 8.2 if
the amount of available memory is less than 4GB.

If that's not it, then I would think this would be pretty easy to
reproduce given its nature.  Can you or the OP reproduce it reliably and
provide some hard data here for developers to review?  Right now there's
just some generic statements.

It's important that folks who experience issues with ZFS on FreeBSD
provide:

- dmesg output (this will include relevant FreeBSD build dates,
  versions, architecture type, amount of RAM, and highly relevant
  disk/storage subsystem details)
- Date of when your source was csup'd (if different from build date)
- /boot/loader.conf contents
- /etc/sysctl.conf contents
- "zpool status" output
- "zfs get all" output
- "top" output (mainly the header portions, re: memory usage)
- "sysctl -a | grep zfs" output
- If I/O is slow, "zpool iostat -v 1" output while the problem happens
- Anything else they can think of that seems relevant (disk I/O graphs,
  or whatever you think might play a role)

Otherwise I can't see the devs being able to track down anything without
concrete data.  I imagine opening a PR would be helpful too, but
discussions often come first, which is fine.

Hope this helps.

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |