ZFS install on a partition

Paul Kraus paul at kraus-haus.org
Sat May 18 13:02:20 UTC 2013


On May 18, 2013, at 3:21 AM, Ivailo Tanusheff <Ivailo.Tanusheff at skrill.com> wrote:

> If you use HBA/JBOD then you will rely on the software RAID of the ZFS system. Yes, this RAID is good, but unless you use SSD disks to boost performance and a lot of RAM the hardware raid should be more reliable and mush faster.

	Why will the hardware raid be more reliable ? While hardware raid is susceptible to uncorrectable errors from the physical drives (hardware raid controllers rely on the drives to report bad reads and writes), and the uncorrectable error rate for modern drives is such that with high capacity drives (1TB and over) you are almost certain to run into a couple over the operational life of the drive. 10^-14 for cheap drives and 10^-15 for better drives, very occasionally I see a drive rated for 10^-16. Run the math and see how many TB worth of data you have to write and read (remember these failures are generally read failures with NO indication that a failure occurred, bad data is just returned to the system).

	In terms of performance HW raid is faster, generally due to the cache RAM built into the HW raid controller. ZFS makes good use of system, RAM for the same function. An SSD can help with performance if the majority of writes are sync (NFS is a good example of this) or if you can benefit from a much larger read cache. SSDs are deployed with ZFS as either write LOG devices (in which case they should be mirrored), but they only come into play for SYNC writes; and as an extension of the ARC, the L2ARC, which does not have to be mirrored as it is only a cache of existing data for spying up reads.

> I didn't get if you want to use the system to dual boot Linux/FreeBSD or just to share FreeBSD space with linux.
> But I would advise you to go with option 1 - you will get most of the system and obviously you don't need zpool with raid, as your LSI controller will do all the redundancy for you. Making software RAID over the hardware one will only decrease performance and will NOT increase the reliability, as you will not be sure which information is stored on which physical disk.
> 
> If stability is a MUST, then I will also advise you to go with bunch of pools and a disk designated as hot spare - in case some disk dies you will rely on the automation recovery. Also you should run monitoring tool on your raid controller.

	I think you misunderstand the difference between stability and reliability. Any ZFS configuration I have tried on FreeBSD is STABLE, having redundant vdevs (mirrors or RAIDz<n>) along with hot spares can increase RELIABILITY. The only advantage to having a hot spare is that when a drive fails (and they all fail eventually), the REPLACE operation can start immediately without you noticing and manually replacing the failed drive.

	Reliability is a combination of reduction in MTBF (mean time between failure) and MTTR (mean time to repair). Having a hot spare reduces the MTTR. The other way to improve MTTR is to go with smaller drives to recede the time it takes the system to resilver a failed drive. This is NOT applicable in the OP's situation. I try very hard not so use drives larger than 1TB because resilver times can be days. Resilver time also depends on the total size of the the data in a zpool, as a resolver operation walks the FS in time, replaying all the writes and confirming that all the data on disk is good (it does not actually rewrite the data unless it finds bad data). This means a couple things, the first of which is that the resilver time will be dependent on the amount of data you have written, not the capacity. A zppol with a capacity of multiple TB will resilver in seconds if there is only a few hundred MB written to it. Since the resilver operation is not just a block by block copy, but a replay, it is I/Ops limited not bandwidth limited. You might be able to stream sequential data from a drive at hundreds of MB/sec., but most SATA drives will not sustain more than one to two hundred RANDOM I/Ops (sequentially they can do much more).

> You can also set copies=2/3 just in case some errors occur, so ZFS can auto0repair the data. if you run ZFS over several LUNs this will make even more sense. 

--
Paul Kraus
Deputy Technical Director, LoneStarCon 3
Sound Coordinator, Schenectady Light Opera Company



More information about the freebsd-questions mailing list