Best practices for ZFS setup for a strictly SSD based system?

Jan Bramkamp crest at rlwinm.de
Tue Feb 9 17:28:34 UTC 2016



On 09/02/16 16:54, Patrick M. Hausen wrote:
> Hi, all,
>
> while there is quite a bit of documentation on how to improve ZFS performance
> by using a combination of rotating disks and SSDs, I have not found much about
> an SSD only setup.
>
> We are planning to try a hosting server with 8 SATA SSDs with ZFS. Things I am
> not at all sure about:
>
> *	Does the recommended limit of 6 disks for a RAIDZ2 still
> 	hold? 2x 4 disks is quite a bit of overhead, could I use all 8
> 	in one vdev and get away with it?
> 	(The maximum of 6 recommendation is in some old Sun doc)

There are multiple reasons to limit number of disks per RAID-Z VDEV.

  * Resilver time: ZFS has to process all objects ordered by transaction 
id to resilver a RAID-Z. Resilvering is a torture test for the remaining 
disks of your degraded RAID-Z and with the ratio of bandwidth to 
capacity of current hard disks resilvering takes too long. This isn't an 
issue for SSDs.

  * For performance estimations think of the RAID-Z of one huge disk 
with larger blocks but the same IOPS as the slowest disk in the RAID-Z. 
Databases perform disk I/O in small blocks limiting your RAID-Z to the 
performance of about one of its member disks.

  * A ZFS pool can only grow by adding whole VDEVS or replacing all 
disks in a VDEV one at a time. Using mirror allows the pool to grow in 
smaller increments.

> *	Will e.g. MySQL still profit from residing on a mirror
> 	instead of a RAIDZ2, even if all disks are SSDs?

Yes OpenZFS schedules reads on mirrors to the disk with the shortest 
queue thus a mirror offers about sum of its member disks in read 
performance (IOPS and bandwidth) and the minimum of its member disks in 
write performance (IOPS and bandwidth). A pool with as many mirrored 
VDEVs as possible will offer the optimal performance for a given number 
of disks. For write heavy workloads the quality of the SSDs matters a 
lot as well. Cheap consumer SSDs can't sustain high write rates for any 
length of time. Even medium quality SSDs have a lot of jitter and suffer 
from throughput degradation under sustained write loads. Optimized 
server SSDs can sustain random write workloads with little jitter and 
bounded latency.

A NVMe SSD can offer an additional order of magnitude performance 
increase over SATA SSDs but at a significant increase in price. With 
multiple NVMe SSDs you will run into the current scalability limits of 
ZFS and GEOM.

> *	Does a separate ZIL and/or ARC cache device still
> 	make sense?

Most likely not.



An other optimization is splitting the log and table space and creating 
a dedicated ZFS dataset for each. Create the dataset containing the 
table space with the fixed record size of your MySQL backend. ZFS also 
offers a lot more consistency and atomicity quarantines  than required 
by a minimal POSIX file system. This allows you to further reduce the 
syncing overhead by tuning MySQL to take advantage of ZFS quarantines.


More information about the freebsd-stable mailing list