OS support for fault tolerance (re-send)

Rayson Ho raysonlogin at gmail.com
Tue Feb 14 17:49:41 UTC 2012


(The email below did not show up on the online archive - resending...)

---------- Forwarded message ----------
From: Rayson Ho <raysonlogin at gmail.com>
Date: Tue, Feb 14, 2012 at 12:27 PM
Subject: Re: OS support for fault tolerance


On Tue, Feb 14, 2012 at 11:57 AM, Julian Elischer <julian at freebsd.org> wrote:
> but I'm interested in any answers people may have

The way other OSes handle this is by detecting any abnormal amounts of
faults (sometimes it's not the fault of the hardware - eg. when a
partical from the outerspace hits a core and flips the bit), then the
disable the core(s).

Solaris & mainframe (z/OS) handle it this way, but you should google
and find more info since I don't remember all the details.

Also, see this presentation: "Getting to know the Solaris Fault
Management Architecture (FMA)":
http://www.prefetch.net/presentations/SolarisFaultManagement_Presentation.pdf

Rayson

=================================
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/

>
>
>> _______________________________________________
>> freebsd-hackers at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>>
>
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"


More information about the freebsd-hackers mailing list