ZFS pool balance and performance

From: Chris Ross <cross+freebsd_at_distal.com>
Date: Sun, 24 Aug 2025 05:46:16 UTC
I have been having some performance problems with a large ZFS pool, which
mostly contains large files that are accessed (read) via NFS.  The pool
was built with two 3-disk raidz vdevs, together.  I have replaced the
drives in those vdevs a couple of times over the years to increase capacity.
As of today I have a 75T pool that is 89% capacity and 41% fragmented.  I
know that performance can become an issue as a pool becomes nearly full,
and I plan to move this whole system to a newer system with larger faster
drives, etc etc.  But that is taking longer than I'd planned.

So, noticing particular problems tonight I looked more deeply at it.  And,
I found that the two vdevs are unequally full.  By a good amount!

NAME                         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
tank                        75.8T  67.8T  8.01T        -         -    41%    89%  1.00x    ONLINE  -
  raidz1-0                  37.6T  30.8T  6.87T        -         -    27%  81.7%      -    ONLINE
    da1p4                   12.6T      -      -        -         -      -      -      -    ONLINE
    diskid/DISK-QGH0S3UTp1  12.7T      -      -        -         -      -      -      -    ONLINE
    diskid/DISK-QGH0Y5ATp1  12.7T      -      -        -         -      -      -      -    ONLINE
  raidz1-1                  38.2T  37.0T  1.14T        -         -    55%  97.0%      -    ONLINE
    diskid/DISK-9JG7REXTp1  12.7T      -      -        -         -      -      -      -    ONLINE
    diskid/DISK-9JG3M05Tp1  12.7T      -      -        -         -      -      -      -    ONLINE
    diskid/DISK-9JG7RRNTp1  12.7T      -      -        -         -      -      -      -    ONLINE

So one of the vdevs is 97% full.  I fear that is causing my occasional
read performance issues on my NFS exported filesystem.

So, questions.  (1) Can I rebalance.  Searching tells me no, ZFS can’t
do that.  Nor defrag.  (2) Is there any way I can identify files that
are more on raidz1-1, and delete them?

And (2), or perhaps (0), how did it get this way?  With two vdevs,
shouldn’t it always allocate evenly across them?  This is a many years
old pool, so I’m guessing just “reasons”, but I don’t know why they
would be/stay in this state.  I’ve been adding hundreds of gigabytes
of data to the pool, and deleting data of similar sizes, regularly
over the months/years.  Percent capacity has continued to increase,
but the uneven %CAP between the two vdevs seems really odd.

Anyway, sorry for long text, but would appreciate any thoughts.
Thank you.

- Chris