ha-cluster on amd64

Wed Sep 27 11:36:13 PDT 2006

On Tue, 26 Sep 2006, Eric Anderson wrote:
> 
> On 09/26/06 09:52, dmitry surovtsev wrote:
>> Hi all,
>>      The main aim: build high-available system on two or more FreeBSD 
>> servers like HA-Linux.
>>   Does anyone know if such a project exist?
>>   Clusterit port is not exactly what I want. Heartbeat is beautiful, but 
>> version 2 is available for linux only.
>>   The latest 1.2.4 version in FreeBSD ports throws multiple errors while 
>> compiling and compilation stops.
>
>
> I guess this question mostly depends on what you are trying to cluster (as 
> in: what services).
>
> Eric

Application-level redundancy is much less interesting, or at least it is 
much less difficult.   I've been looking at OpenSSI lately, as I have 
wanted to give my Linux Xen customers better availability;  from what I 
understand, I can run existing apps in a "fork and forget" manner with 
OpenSSI-  that greatly reduces management complexity:  say I'm managing 
a webapp (something I do from time to time)   with application-level 
clustering, I setup a centralized redundant DB (Lately I've been using 
the MySQL cluster for webapps.)  then I set up all the webservers such 
that a user can connect to an arbitrary webserver and continue his session 
(thus the central DB)  and finally I put a couple squid proxies in front 
of the whole mess, and use VRRP or carp so I can surive a failure of a 
squid box.  Lots of application-specific thinking is required, and often 
some re-coding.  That, and I've got to manage a minimum of 7 boxes now, 
which usually requires a kerberos/nis setup to manage the users an the 
rest of the complexity of managing a large number of servers.

Now, something that I've spent a lot of time doing is scaling out badly 
written (from a performance/scaling perspective)  webapps-  sometimes they 
keep session data in shared memory or on a filesystem.  with something 
like an OpenSSI cluster, you can just keep adding boxes (unless it was a 
one process multiple thread model;  in that case you are screwed.)  and 
the thing will scale (from what I understand, shared memory works with 
OpenSSI,but you take a performance hit)  it seems a whole lot easier than 
re-writing the thing.

(of course, usually by that time the thing is making money, so you might 
as well pay someone to re-write it decently.)

When a node fails, all processes running on that node die, but the 
cluster stays up (assuming you have your root-node setup in redundant 
mode)  so it seems that the only cluster-specific thing you would need is 
a 'nanny' process that restarts important stuff after a node failure.

(and with most of these webapps, you often need that anyhow.  I had a 
couple clients that were really, really happy with me after I set them up 
with nagios monitoring that automatically sshed in and restarted httpd 
every time it stopped responding.  I agree that the right thing to do 
would have been to fix the memory leak, but this is what the customer 
wanted.)

Also note, I haven't actually used OpenSSI-  I've just been reading up on 
it and I thought I'd jump in and  say something.