HPC and zfs.

Peter Ankerstål peter at pean.org
Mon Feb 6 18:04:42 UTC 2012


--
Peter Ankerstål
peter at pean.org
http://www.pean.org/

On 6 feb 2012, at 17:41, Freddie Cash wrote:

> On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick
> <freebsd at jdc.parodius.com> wrote:
>> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote:
>>> I want to investigate if it is possible to create your own usable
>>> HPC storage using zfs and some
>>> network filesystem like nfs.
>>> 
>>> Just a thought experiment..
>>> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more)
>>> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD
>>> deives for  cache.
>>> Preferrably in  mirror where applicable.
>>> 
>>> Connected to this machine we will have about 410 3TB drives to give approx
>>> 1PB of usable storage in a 8+2 raidz configuration.
>>> 
>>> Connected to this will be a ~800 nodes big HPC cluster that will
>>> access the storage in parallell
>>> is this even possible or do we need to distribute the meta data load
>>> over many servers? If that is the case,
>>> does it exist any software for FreeBSD that could  accomplish this
>>> distribution (pNFS  dosent seem to be
>>> anywhere close to usable in FreeBSD) or do I need to call NetApp or
>>> Panasas right away? It would be
>>> really nice if I could build my own storage solution.
>>> 
>>> Other possible solutions to this problem is extremley welcome.
>> 
>> For starters I'd love to know:
>> 
>> - What single motherboard supports up to 192GB of RAM
> 
> SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM
> slots).  It's an AMD board, but there should be variants that support
> Intel CPUs.  It's not uncommon to support 256 GB of RAM these days,
> although 128 GB boards are much more common.
Yeah, the one I was looking at was SuperMicro X8DTU-F, but yeah, the more
money RAM the better.
> 
>> - How you plan on getting roughly 410 hard disks (or 422 assuming
>>  an additional 12 SSDs) hooked up to a single machine
> 
> In a "head node" + "JBOD" setup?  Where the head node has a mobo that
> supports multiple PCIe x8 and PCIe x16 slots, and is stuffed full of
> 16-24 port multi-lane SAS/SATA controllers with external ports that
> are cabled up to external JBOD boxes.  The SSDs would be connected to
> the mobo SAS/SATA ports.
> 
> Each JBOD box contains nothing but power, SAS/SATA backplane, and
> harddrives.  Possibly using SAS expanders.
> 
> We're considering doing the same for our SAN/NAS setup for
> centralising storage for our VM hosts, although not quite to the same
> scale as the OP.  :)

Yep, NetApp has disk-shelves that can be configured JBOD that fits 60 drives
into 4U. :D

> 
>> If you are considering investing the time and especially money (the cost
>> here is almost unfathomable, IMO) into this, I strongly recommend you
>> consider an actual hardware filer (e.g. NetApp).  Your performance and
>> reliability will be much greater, plus you will get overall better
>> support from NetApp in the case something goes wrong.  In the case you
>> run into problems with FreeBSD (and I can assure you in this kind of
>> setup you will) with this kind of extensive setup, you will be at the
>> mercy of developers' time/schedules with absolutely no guarantee that
>> your problem will be solved.  You definitely want a support contract.
>> Thus, go NetApp.
> 
> For an HPC setup like the OP wants, where performance and uptime are
> critical, I agree. You don't want to be skimping on the hardware and
> software.
> 
A big consideration for us is also the installation. If we go with something like
NetApp they can install the system and we don't need to put in the extra hours
(probably a lot) the get the thing running. But being a huge fan of BSD I wanted
to at least look up the possibility to build our own system.

> However, if you have the money for a NetApp setup like this ($
> 500,000+ US I'm guessing), then you also have the money to hire a
> FreeBSD developer(s) to work on the parts of the system that are
> critical to this (NFS, ZFS, CAM, drivers, scheduler, GEOM, etc).
> Then, you could go with a white-box, custom build and have the support
> in-house.
> 
> -- 
> Freddie Cash
> fjwcash at gmail.com
> 



More information about the freebsd-fs mailing list