drive failure during rebuild causes page fault

Joe Rhett jrhett at meer.net
Wed May 18 17:20:35 PDT 2005


Soren, I've just retested all of this with 5.4-REL and most of the problems
listed here are solved.  The only problems appear to be related to these
ghost arrays that appear when it finds a drive that was taken offline
earlier.  For example, pull a drive and then reboot the system.

1. If you reboot the system you can delete the array cleanly, but it returns
next time.  I can't figure out how to make this information go away, and
I've tried low-level formatting the disks :-(

2. Removing the array using "atacontrol delete" after an "atacontrol reinit
channel" will always produce a page fault.  For example, if you have only a
single array in a system and you lose a drive, and then it returns later..

	# atacontrol status 1
	atacontrol: ioctl(ATARAIDSTATUS): Device not configured
	# atacontrol reinit 5
		...finds disk
	# atacontrol status 1
	ar1: ATA RAID1 subdisks: DOWN DOWN status: DEGRADED
	# atacontrol delete 1  
		*Page Fault*
	
We can't run -current, so I'm hoping to find options to work with this as
is.  If you know for a fact that this has changed in the mkIII patches then
I'd be willing to investigate, but I will need to be certain.

I know that you have no desire to work on this older code, but could you at
least clue me in on how to get atacontrol to drop these ghost arrays?

On Tue, Dec 14, 2004 at 04:53:59PM -0800, Joe Rhett wrote:
> Soren, do you have any thoughts on what I could do to alleviate or better
> debug this page fault?  I've found three ways to cause this:
> in all cases "pull" is either physical pull or "atacontrol detach <channel>"
> 
> 1. Pull a drive and rebuild onto hot spare. Pull hot spare *boom*
> 
> 2. Pull a drive and rebuild onto hot spare. Pull good disk *boom*
> ...should cause filesystem failure, but not page fault when it's not /
> 
> 3. Pull a drive and then put it back.  The system suddenly has a new array
> with just that drive in it. "atacontrol delete <new-array>" *boom*
> 
> In particular, what's the story with the new array appearing when you
> insert a drive with array meta-data on it?  That array appears to be
> half-there (no devices, etc) which is probably what causes #2...
> 
> On Tue, Dec 14, 2004 at 07:58:53AM +0100, Søren Schmidt wrote:
> > Actually I'm in the process of rewriting the ATA RAID code, so things 
> > are rolling, albeit slowly, time is a precious resource. I belive that 
> > it can be made pretty robust, but the rest of the kernel still have 
> > issues with disappearing devices etc thats out of ATA's realm.
> > 
> > Anyhow. I can only test with the HW I have here in the lab, which by far 
> > covers all possible permutations, so testing etc by the community is 
> > very much needed here to get things sorted out...
> 
> -- 
> Joe Rhett
> Senior Geek
> Meer.net
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"

-- 
Joe Rhett
senior geek
meer.net


More information about the freebsd-stable mailing list