ZFS resilvering strangles IO

Tue May 8 20:02:32 UTC 2012

On May 8, 2012, at 16:58, Tom Evans wrote:

> On Tue, May 8, 2012 at 3:33 PM, Michael Gmelin <freebsd at grem.de> wrote:
>> So the question is, is there anything I can do to improve the situation?
>> Is this because of memory constraints? Are there any other knobs to
>> adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD yet.
>> 
>> I have more drives around, so I could replace another one in the server,
>> just to replicate the exact situation.
>> 
> 
> In general, raidz is pretty fast, but when it's resilvering it is just
> too busy. The first thing I would do to speed up writes is to add a
> log device, preferably a SSD. Having a log device will allow the pool
> to buffer writes to the pool much more effectively than normally
> during a resilver.
> Having lots of small writes will kill read speed during the resilver,
> which is the critical thing.
> 
> If your workload would benefit, you could split the SSD down the
> middle, use half for a log device, and half for a cache device to
> accelerate reads.
> 
> I've never tried using a regular disk as a log device, I wonder if
> that would speed up resilvering?
> 
> Cheers
> 
> Tom

Thanks for your constructive feedback. It would be interesting to see if adding an SSD could actually help in this case (it definitely would benefit the machine also during normal operation). Unfortunately it's not an option (the server is maxed out, there is simply no room to add a log device at the moment).

The general question remains - is there a way to make ZFS perform better during resilvering - has anybody experience tuning zfs_resilver_delay on Solaris and if this makes a difference (the variable is in the FreeBSD source code, but I couldn't find a way to change without touching the source)? - or is there something I missed that's specific about my setup. Especially in configurations using raidz2 and raidz3, that can withstand the loss of 2 or even 3 drives, having a longer resilver period shouldn't be an issue, as long as system performance is no degraded - or only degraded to a certain degree (I could see up to 50% more or less tolerable, in my case read performace was OKish, but write performance was reduced by more than 90%, so the machine was almost unusable).

Do you think it would make sense to try to play with zfs_resilver_delay directly in the ZFS kernel module? 

(We have about 20 servers that could run ZFS around here, which currently run various combinations of UFS2+SU (no SUJ, since snapshots are broken currently), either on hardware RAID1 or some gmirror setup. I would like to standardize these setups to use ZFS, but I can't add logging devices to all of the for obvious reasons.)

I somehow feel that simulating this in a virtual machine is probably pointless :)

Cheers,
Michael