drive failure during rebuild causes page fault

Søren Schmidt sos at DeepCore.dk
Mon Dec 13 22:59:26 PST 2004


Doug White wrote:
> On Mon, 13 Dec 2004, Joe Rhett wrote:
> 
> 
>>>This is why I don't trust ATA RAID for fault tolerance -- it'll save your
>>>data, but the system will tank.  Since the disk state is maintained by
>>>the OS and not abstracted by a separate processor, if a disk dies in a
>>>particularly bad way the system may not be able to cope.
>>
>>Yes, but SATA isn't limited by this problem.  It does have a processor per
>>disk. (this is all SATA, if I didn't make that clear)
> 
> Actually on SATA its worse -- the disk just stops responding to everything
> and hangs.  If you don't detect this condition then you go into an
> infinite wait.
> 
> In any case, yes the ATA RAID code could use a massive robustness pass. So
> could the core ATA code.  Patches accepted :)

Actually I'm in the process of rewriting the ATA RAID code, so things 
are rolling, albeit slowly, time is a precious resource. I belive that 
it can be made pretty robust, but the rest of the kernel still have 
issues with disappearing devices etc thats out of ATA's realm.

Anyhow. I can only test with the HW I have here in the lab, which by far 
covers all possible permutations, so testing etc by the community is 
very much needed here to get things sorted out...

-- 

-Søren




More information about the freebsd-stable mailing list