Large ZFS arrays?

Rich rincebrain at gmail.com
Fri Jun 20 15:28:16 UTC 2014


Just FYI, a lot of people who do this use sas[23]ircu for scripting
this, rather than sg3utils, though the latter is more powerful if you
have enough of the SAS spec to play with...

- Rich

On Fri, Jun 20, 2014 at 10:50 AM, Graham Allan <allan at physics.umn.edu> wrote:
> On 6/15/2014 10:28 AM, Dennis Glatting wrote:
>>
>> Anyone built a large ZFS infrastructures (PB size) and care to share
>> words of wisdom?
>
>
> This is a bit of a late response but I wanted to put in our "me too" before
> I forget...
>
> We have about 500TB of storage on ZFS at present, and plan to add 600TB more
> later this summer, mostly in similar arrangements to what I've seen
> discussed already - using Supermicro 847 JBOD chassis and a mixture of Dell
> R710/R720 head nodes, with LSI 9200-8e HBAs. One R720 has four 847 chassis
> attached, a couple R710s just have a single chassis. We originally installed
> one HBA in the R720 for each chassis but had some deadlock problems at one
> point, which was resolved by daisy-chaining the chassis from a single HBA. I
> had a feeling it was maybe related to kern/177536 but not really sure.
>
> We've been running FreeBSD 9.1 on all the production nodes, though I've long
> wanted to (and am now beginning to) set up a reasonable long-term testing
> box where we could check out some of the kernel patches or tuning
> suggestions which come up - also beginning to test the 9.3 release for the
> next set of servers.
>
> We built all these conservatively with each chassis as a separate pool, each
> having four 10-drive raidz2 vdevs, a couple of spares, a cheapish L2ARC SSD
> and a mirrored pair of ZIL SSD (maybe unnecessary to mirror this these
> days?). I was using the Intel 24GB SLC drive for the ZIL, will need to
> choose something new for future pools.
>
> Would be interesting to hear a little about experiences with the drives
> used... For our first "experimental" chassis we used 3TB Seagate desktop
> drives - cheap but not the best choice, 18 months later they are dropping
> like flies (luckily we can risk some cheapness here as most of our data can
> be re-transferred from other sites if needed). Another chassis has 2TB WD
> RE4 enterprise drives (no problems), and four others have 3TB and 4TB WD
> "Red" NAS drives... which are another "slightly risky" selection but so far
> have been very solid (also in some casual discussion with a WD field
> engineer he seemed to feel these would be fine for both ZFS and hadoop use).
>
> Tracking drives for failures and replacements was a big issue for us. One of
> my co-workers wrote a nice perl script which periodically harvests all the
> data from the chassis (via sg3utils) and stores the mappings of chassis
> slots, da devices, drive labels, etc into a database. It also understands
> the layout of the 847 chassis and labels the drives for us according to some
> rules we made up - we do some prefix for the pool name, then "f" or "b" for
> front/back of chassis, then the slot number, and finally (?) has some
> controls to turn the chassis drive identify lights on or off. There might be
> other ways to do all this but we didn't find any, so it's been incredibly
> useful for us.
>
> As far as performance goes we've been pretty happy. Some of these get
> relatively hammered by NFS i/o from cluster compute jobs (maybe ~1200
> processes on 100 nodes) and they have held up much better than our RHEL NFS
> servers using fiber channel RAID storage. We've also performed a few bulk
> transfers between hadoop and ZFS (using distcp with an NFS destination) and
> saw sustained 5Gbps write speeds (which really surprised me).
>
> I think that's all I've got for now.
>
> Graham
> --
> -------------------------------------------------------------------------
> Graham Allan
> School of Physics and Astronomy - University of Minnesota
> -------------------------------------------------------------------------
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"


More information about the freebsd-fs mailing list