Re: ZFS pool balance and performance
- In reply to: Chris Ross : "Re: ZFS pool balance and performance"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 25 Aug 2025 14:09:49 UTC
On 25/08/2025 14:21, Chris Ross wrote: <snip> > Okay. A program to flip bytes and write blocks would be easy enough, but as > you note even if it worked it would have downsides. But, right now it not > working is more my concern (below) > >> 4) If the imbalance is caused by ZFS choosing to migrate new CoW data to one a particular dataset (or away from another) then this will only encourage it to continue. ZFS is designed to balance writes between vdevs on a zpool for efficiency, so if you start with two balanced vdevs they would remain balanced. If you added a vdev and continued writing to the zpool, it would tend towards balance over time. It favours vdevs with the most space for a write if all other things are equal, so normal usage should drift data anyway from the existing vdevs and on to the new. Key being "all other things are equal". >> >> So I think you need to find out why ZFS has decided your zdevs are more efficient unbalanced (whether it's right or not). More writes are just going to make matters worse. If it's changed it's mind, normal use will balance it over time. > Yes. This is my thought too. There are many gigabytes added/written to this > pool on a daily basis. Time Machine backups may or may not be making files > rather than writing existing files, but writes to be sure. The data archival > I put on by hand regularly is always new files. MB or GB at a time, but I > would expect that to spread them evenly, and it seems not to be. > > Any idea what to look for as to why ZFS may be preferring one of the devs > that to me seem equivalent? Well one thing I've noticed over the years is that ZFS is not good AT ALL about reporting drives that are on the way out. You only get to know about it when the fire has spread to the whole enclosure. Following an incident I got really interested in this matter and did some research - there are traces of it in freebsd-questions but I don't seem to have written it up in my blog. This might be helpful though: https://blog.frankleonhardt.com/2025/freebsd-zfs-raidz-failed-disk-replacement/ (One always hopes one's blog posts are going to be helpful; whether they are or not is a different matter). Long story short, don't trust ZFS saying a drive or VDEV is okay. This just means it hasn't failed completely. SAS drives, in particular, never want to fail so could retry for ages and eventually succeed even though you, as their owner, would rather know about it. So, have a look at the drives using smartmontools (which do work on SAS now - the post is wrong). Try "zpool iostat -v 5" - the -v gives you stats for each vdev and drive, not just the whole pool. All the drives in the pool should have similar stats, although not identical as drives in he real world aren't. If there's an outlier, that might be your problem. Good old "diskinfo -t" doing a speed benchmark test on all your drives is worth a try. Like any benchmark, it's not accurate (especially now drives lie about their geometry). But all the drives of the same type in a zpool should produce similar numbers. Obviously wait until the pool is quiet, but you can do it while it's online. It'd be nice, of course, if zfs actually said "I'm a bit concerned about drive X as it's slowing down." and stuff like that, but it doesn't. How do I know all this? Earlier this year bought a large number of used 2Tb SAS drives on eBay expecting a lot of them to be a bit flaky so I could test ZFS failure modes. I was not disappointed! Regards, Frank.