Storage cluster advise, anybody?

Fri Apr 22 19:38:29 UTC 2016

On Fri, Apr 22, 2016 at 12:17 PM, Valeri Galtsev <galtsev at kicp.uchicago.edu>
wrote:

> Dear Experts,
>
> I would like to ask everybody: what would you advise to use as a storage
> cluster, or as a distributed filesystem.
>
> I made my own research of what I can do, but I hit a snag with my
> seemingly best choice, so I decided to stay away from it finally, and ask
> clever people what they would use.
>
> My requirements are:
>
> 1. I would like to have one big (say, comparable to petabyte) filesystem,
> accessible on more than one machine, composed of disk space leftovers on a
> bunch of machines having 1 gigabit per second ethernet connections
>
> 2. It can be a bit slow, as filesystem one would need for backups onto it
> (say, using bacula or bareos), and/or for long term storage of large
> datasets, portions of which can be copied over to faster storage for
> processing if necessary. I would be thinking in 1-2 TB of data written to
> it daily.
>
> 3. It would be great to have it single machine failure/reboot resilient
>
> 4. metadata machines should be redundant (or at least backup medatada host
> should be manually convertible into master metadata host if fatal failure
> to master or corruption of its data happens)
>
>
> What I would like to avoid/exclude:
>
> 1. Proprietary commercial solutions, as:
>
> a. I would like to stay on as minimal budget as possible
> b. I want to be able to predict that it will exist for long time, and I
> have better experience with my predictions of this sort about open source
> projects as opposed to proprietary ones
>
> 2. Open source solutions using portions of proprietary closed source
> binaries/libraries (e.g., I would like to stay away from google
> proprietary code/binaries/libraries/modules)
>
> 3. Kernel level modifications. I really would like to have this
> independent of OS as much as I can, or rather available on multiple OSes
> (though I do not like Java based things - just my personal experience with
> some of them). I have a bunch of Linux boxes and a bunch of FreeBSD boxes,
> and I do not want to exclude neither of them if possible. Also, the need
> to have custom Linux kernel specifically scares me: Linux kernels get
> critical updates often, and having customizations lagging behind the need
> of critical update is as unpleasant as rebooting the machine because of
> kernel update is.
>
> I'm not too scared of a "split nature" projects: proprietary projects
> having open source satellite. I have mixed experience with those, using
> open source satellite I mean. Some of them are indeed not neglected, and
> even though you may be missing some features commercial counterpart has,
> some are really great ones: they are just missing commercial support, and
> maybe having a bit sparse documentation, thus making you to invest more
> effort into making it work, which I don't mind: I can earn my sysadmin's
> salary here. I would say I more often had good experience with those than
> bad one (and I have a list of early indications of potential bad outcome,
> so I can more or less predict my future with this kind of projects).
>
> <rant>
> I really didn't mean to write this, but I figure it probably will surface
> once I start getting your advices, so here it is. I did my research having
> my requirements in mind and came up with the solution: moosefs. It is not
> reviewed much, no reviews with criticism at all, and not much you can ("I
> could" I should say) find howtos about customizations, performance tuning
> etc. It installs without a hitch. It runs well, until you start stress
> writing a lot to it in parallel, then it started performing exponentially
> badly for me. Here is where extensive attempts to find performance tuning
> documentation faces lack of success. What made my decision to never ever
> use it in a future was the following. I started migrating data back from
> moosefs to local UFS (that is FreeBSD box) filesystem using rsync command.
> What I observed was: source files after they have been touched by rsync
> changed their timestamps. As if instead of creation timestamp it is an
> access timestamp on moosefs. This renders rsync from moosefs useless, as
> you can not re-run failed rsync, and you obliterate some of metadata of
> the source ("creation" timestamp). I wrote e-mail to sourceforge moosefs
> mail list, mentioning all this and the fact that I am using open source
> moosefs. Next day they replied asking whether I use version 3."this" or
> version 3."that", as they want to know in which of them they have a bug.
> Whereas latest open source version they have everywhere, including
> sourceforge is older version: 2.0.88.
> Basically, my decision was made. Sorry for venting it out here, but I
> figured, it will happen some moment when I will get your advises.
> </rant>
>
> Thanks a lot for all your advises!
>
> Valeri
>
> ++++++++++++++++++++++++++++++++++++++++
> Valeri Galtsev
> Sr System Administrator
> Department of Astronomy and Astrophysics
> Kavli Institute for Cosmological Physics
> University of Chicago
> Phone: 773-702-4247
> ++++++++++++++++++++++++++++++++++++++++
>
>
>
>
> _______________________________________________
>
>
>
In page :
http://moosefs.org/download.html

 there are the following lines with download links :

Current release version 3.0.73-1 (GPLv2) 2016-03-04
Current release version 3.0.69-1 (GPLv2) 2016-01-12
Current release version 3.0.55-1 (GPLv2) 2015-10-21
Current release version 3.0.37-1 (GPLv2) 2015-05-15
Current release version 3.0.17-1 (GPLv2) 2015-04-21
 .
 .
 .

Perhaps they assumed you are using one of them .

Mehmet Erol Sanliturk