HPC and zfs.

Michael Fuckner michael at fuckner.net
Mon Feb 6 17:25:07 UTC 2012


On 02/06/2012 05:41 PM, Freddie Cash wrote:
Hi all,

> On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick
> <freebsd at jdc.parodius.com>  wrote:
>> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote:
>>> I want to investigate if it is possible to create your own usable
>>> HPC storage using zfs and some network filesystem like nfs.
especially HPS sounds interesting to me- but for HPC you typicially need 
fast r/w-access for all nodes in the cluster. That's why Lustre uses 
several storages for concurring access over a fast link (typicially 
Infiniband)

Another thing to think about is CPU: you probably need weeks for a 
rebuild of a single disk in a Petabyte Filesystem- I haven't tried this 
with ZFS yet, but I'm really interested if anyone already did this.

The whole setup sounds a little bit like the system shown by aberdeen:
http://www.aberdeeninc.com/abcatg/petabyte-storage.htm

schematics at tomshardware:
http://www.tomshardware.de/fotoreportage/137-Aberdeen-petarack-petabyte-sas.html

The Problem with Aberdeen is they don't use Zil/ L2Arc.



>>> Just a thought experiment..
>>> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more)
>>> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD
>>> deives for  cache.
>>> Preferrably in  mirror where applicable.
>>>
>>> Connected to this machine we will have about 410 3TB drives to give approx
>>> 1PB of usable storage in a 8+2 raidz configuration.
I don't know what the situation is for the rest of the world, but 3TB 
currently is still hard to buy in Europe/ Germany.

>>> Connected to this will be a ~800 nodes big HPC cluster that will
>>> access the storage in parallell
what is your typical load pattern?

>>> is this even possible or do we need to distribute the meta data load
>>> over many servers?
It is a good idea to have

>>> If that is the case,
>>> does it exist any software for FreeBSD that could  accomplish this
>>> distribution (pNFS  dosent seem to be
>>> anywhere close to usable in FreeBSD) or do I need to call NetApp or
>>> Panasas right away?
not that I know of

> SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM
> slots).  It's an AMD board, but there should be variants that support
> Intel CPUs.  It's not uncommon to support 256 GB of RAM these days,
> although 128 GB boards are much more common.
Currently Intel CPUs have 3 Memory Channels.

If you have 2 Sockets, 2 Dimms per Channel, 3 Channels- 12 Dimms with 
cheap 16GB Modules is 192GB. 32GB are also available today ;-)



>> - How you plan on getting roughly 410 hard disks (or 422 assuming
>>   an additional 12 SSDs) hooked up to a single machine
>
> In a "head node" + "JBOD" setup?  Where the head node has a mobo that
> supports multiple PCIe x8 and PCIe x16 slots, and is stuffed full of
> 16-24 port multi-lane SAS/SATA controllers with external ports that
> are cabled up to external JBOD boxes.  The SSDs would be connected to
> the mobo SAS/SATA ports.
>
> Each JBOD box contains nothing but power, SAS/SATA backplane, and
> harddrives.  Possibly using SAS expanders.
If you use Supermicro I would use X8DTH-iF, some LSI HBA (9200-8e, 2x 
Multilane external) and some JBOD-Chassis (like SUpermicro 847E16-RJBOD1)

Regards,
  Michael!


More information about the freebsd-fs mailing list