HPC and zfs.
Freddie Cash
fjwcash at gmail.com
Mon Feb 6 16:41:54 UTC 2012
On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick
<freebsd at jdc.parodius.com> wrote:
> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote:
>> I want to investigate if it is possible to create your own usable
>> HPC storage using zfs and some
>> network filesystem like nfs.
>>
>> Just a thought experiment..
>> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more)
>> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD
>> deives for cache.
>> Preferrably in mirror where applicable.
>>
>> Connected to this machine we will have about 410 3TB drives to give approx
>> 1PB of usable storage in a 8+2 raidz configuration.
>>
>> Connected to this will be a ~800 nodes big HPC cluster that will
>> access the storage in parallell
>> is this even possible or do we need to distribute the meta data load
>> over many servers? If that is the case,
>> does it exist any software for FreeBSD that could accomplish this
>> distribution (pNFS dosent seem to be
>> anywhere close to usable in FreeBSD) or do I need to call NetApp or
>> Panasas right away? It would be
>> really nice if I could build my own storage solution.
>>
>> Other possible solutions to this problem is extremley welcome.
>
> For starters I'd love to know:
>
> - What single motherboard supports up to 192GB of RAM
SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM
slots). It's an AMD board, but there should be variants that support
Intel CPUs. It's not uncommon to support 256 GB of RAM these days,
although 128 GB boards are much more common.
> - How you plan on getting roughly 410 hard disks (or 422 assuming
> an additional 12 SSDs) hooked up to a single machine
In a "head node" + "JBOD" setup? Where the head node has a mobo that
supports multiple PCIe x8 and PCIe x16 slots, and is stuffed full of
16-24 port multi-lane SAS/SATA controllers with external ports that
are cabled up to external JBOD boxes. The SSDs would be connected to
the mobo SAS/SATA ports.
Each JBOD box contains nothing but power, SAS/SATA backplane, and
harddrives. Possibly using SAS expanders.
We're considering doing the same for our SAN/NAS setup for
centralising storage for our VM hosts, although not quite to the same
scale as the OP. :)
> If you are considering investing the time and especially money (the cost
> here is almost unfathomable, IMO) into this, I strongly recommend you
> consider an actual hardware filer (e.g. NetApp). Your performance and
> reliability will be much greater, plus you will get overall better
> support from NetApp in the case something goes wrong. In the case you
> run into problems with FreeBSD (and I can assure you in this kind of
> setup you will) with this kind of extensive setup, you will be at the
> mercy of developers' time/schedules with absolutely no guarantee that
> your problem will be solved. You definitely want a support contract.
> Thus, go NetApp.
For an HPC setup like the OP wants, where performance and uptime are
critical, I agree. You don't want to be skimping on the hardware and
software.
However, if you have the money for a NetApp setup like this ($
500,000+ US I'm guessing), then you also have the money to hire a
FreeBSD developer(s) to work on the parts of the system that are
critical to this (NFS, ZFS, CAM, drivers, scheduler, GEOM, etc).
Then, you could go with a white-box, custom build and have the support
in-house.
--
Freddie Cash
fjwcash at gmail.com
More information about the freebsd-fs
mailing list