[zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built
Richard Elling
richard.elling at gmail.com
Mon Mar 7 06:04:14 UTC 2016
> On Mar 6, 2016, at 9:06 PM, Fred Liu <fred.fliu at gmail.com> wrote:
>
>
>
> 2016-03-06 22:49 GMT+08:00 Richard Elling <richard.elling at richardelling.com <mailto:richard.elling at richardelling.com>>:
>
>> On Mar 3, 2016, at 8:35 PM, Fred Liu <Fred_Liu at issi.com <mailto:Fred_Liu at issi.com>> wrote:
>>
>> Hi,
>>
>> Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC RAID introduction,
>> the interesting survey -- the zpool with most disks you have ever built popped in my brain.
>
> We test to 2,000 drives. Beyond 2,000 there are some scalability issues that impact failover times.
> We’ve identified these and know what to fix, but need a real customer at this scale to bump it to
> the top of the priority queue.
>
> [Fred]: Wow! 2000 drives almost need 4~5 whole racks!
>>
>> For zfs doesn't support nested vdev, the maximum fault tolerance should be three(from raidz3).
>
> Pedantically, it is N, because you can have N-way mirroring.
>
> [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk works in theory and rarely happens in reality.
>
>> It is stranded if you want to build a very huge pool.
>
> Scaling redundancy by increasing parity improves data loss protection by about 3 orders of
> magnitude. Adding capacity by striping reduces data loss protection by 1/N. This is why there is
> not much need to go beyond raidz3. However, if you do want to go there, adding raidz4+ is
> relatively easy.
>
> [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh of 2000 drives. If that is true, the possibility of 4/2000 will be not so low.
> Plus, reslivering takes longer time if single disk has bigger capacity. And further, the cost of over-provisioning spare disks vs raidz4+ will be an deserved
> trade-off when the storage mesh at the scale of 2000 drives.
Please don't assume, you'll just hurt yourself :-)
For example, do not assume the only option is striping across raidz3 vdevs. Clearly, there are many
different options.
-- richard
More information about the zfs-devel
mailing list