kern/95459: Rebooting the system while rebuilding RAID (Intel MatrixRAID) results in data loss

oleg dashevskii be9-ml at be9.ru
Fri Apr 7 05:40:14 UTC 2006


>Number:         95459
>Category:       kern
>Synopsis:       Rebooting the system while rebuilding RAID (Intel MatrixRAID) results in data loss
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Apr 07 05:40:12 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator:     oleg dashevskii
>Release:        6.1-BETA4
>Organization:
IAE SB RAS
>Environment:
FreeBSD mx2.iae.nsk.su 6.1-BETA4 FreeBSD 6.1-BETA4 #0: Tue Mar 14 13:59:38 UTC 2006    root at wv1u.samsco.home:/usr/obj/usr/src/sys/GENERIC  i386
>Description:
I've got a motherboard with a ICH7 chipset which supports RAID. Using BIOS utility, I created a RAID1 of two SATA disks (150 Gbytes each). I installed  FreeBSD 6.1-BETA4 then. No prob, ar0 has been detected and voila.

Upon installation, I wanted to check the RAID1 functioning and pulled away the power cord from one of the disks. This was immediately detected and RAID1 found itself in a DEGRADED state. I turned on the power again (the disk was detected) and used "atacontrol addspare" and then "atacontrol rebuild" to recreate the array.

The rebuilding process was nearly complete when I decided to reboot the box. To my surprise, the RAID was no more detected by the BIOS. The first disk was labeled as "Single" or "Separate" or whatever (don't remember it exactly), the second as "Spare". But there were no RAID volumes detected (as shown on the screen) and FreeBSD wasn't going to boot. So I had to "un-RAID" both disks, recreate the array and reinstall FreeBSD.

I decided to see what will be if I wait until complete rebuilding. Just after it was complete, the ATA driver hanged for nearly 10 secs. It unhanged with the following messages:
ad6: WARNING - WRITE_DMA taskqueue timeout - completing request directly
ad6: WARNING - WRITE_DMA48 freeing taskqueue zombie request

This is bad news - you get RAID1 for redundancy, but if you occasionally reboot while rebuilding, you lose ALL your data.

M$ Windows XP in a similar situation is able to continue the rebuilding process from the point it was stopped when you initiated a reboot.
>How-To-Repeat:
1. Get a working RAID1 of two disks on a Intel MatrixRAID (ICH7 chipset).

2. Put array in a DEGRADED state by removing the power cord from one of the disks.

3. Regain the power, put the disk back to the array by using e.g. "atacontrol addspare ar0 ad4" and initiate the rebuilding by "atacontrol rebuild ar0".

4. Reboot the system while rebuilding.

5. You get it - the system doesn't boot, the data are LOST.
>Fix:
none
>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list