kern/121899: Drive detached from Intel Matrix RAID and returned comes up as entirely new ataraid

Stef Walter stef at memberwebs.com
Thu Mar 20 06:30:02 UTC 2008


>Number:         121899
>Category:       kern
>Synopsis:       Drive detached from Intel Matrix RAID and returned comes up as entirely new ataraid
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Mar 20 06:30:01 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Stef Walter
>Release:        FreeBSD 6.3 and FreeBSD 7.0
>Organization:
None
>Environment:
FreeBSD new1.web.ws.local 6.3-RELEASE-p1 FreeBSD 6.3-RELEASE-p1 #10: Wed Mar 19 17:05:41 UTC 2008     root at new1.web.local:/usr/obj/usr/src/sys/RACK1  i386
>Description:
Note: This pertains to ataraid RAID devices using the Intel MatrixRAID hardware.

A drive that was once part of an ataraid device, when added back to the machine that it was on, shows up as a new ataraid device. This new ataraid device tries to use all the drives that were originally in the RAID. Results can range from a confusion to a real mess. 

>How-To-Repeat:
 1. Build a RAID1 device with two drives on Intel MatrixRAID hardware. This creates 'ar0'
    # atacontrol create RAID1 ad4 ad6
    ad0 created

 2. Shutdown the machine, and remove ad6. Or if you really want to simulate a failure, 
    jerk it from its socket :)

 3. When the machine restarts (it'll panic unless you apply patch on pr/102211) you'll 
    see the RAID is degraded:
    # atacontrol status ar0
    ar0: ATA RAID1 status: DEGRADED
      subdisks:
        0 ad4   DOWN
        1 ----- MISSING

  4. Reattach the new drive and a new raid 'ar1' will appear with ad6. It tries to 
     use ad4 as well, but its already in use by 'ar0'.

>Fix:
Don't rewrite the config_id of the RAID every time something changes. That's what the generation is for. The config_id should remain the same for the lifetime of the RAID. We need to be diligent about incrementing the generation whenever the RAID status changes, including on boot in case of a DEGRADED array.

This fix causes the ad6 (in the example above) to be recognized correctly as an out of date member of an already present RAID.



Patch attached with submission follows:

--- sys/dev/ata/ata-raid.c.orig	2008-03-19 11:20:15.000000000 +0000
+++ sys/dev/ata/ata-raid.c	2008-03-19 21:53:37.000000000 +0000
@@ -848,10 +848,17 @@
 	rdp->status &= ~AR_S_READY;
     }
 
+    /* 
+     * Note that when the array breaks so comes up broken we 
+     * force a write of the array config to the remaining 
+     * drives so that the generation will be incremented past 
+     * those of the missing or failed drives (in all cases).
+     */
     if (rdp->status != status) {
 	if (!(rdp->status & AR_S_READY)) {
 	    printf("ar%d: FAILURE - %s array broken\n",
 		   rdp->lun, ata_raid_type(rdp));
+            writeback = 1;
 	}
 	else if (rdp->status & AR_S_DEGRADED) {
 	    if (rdp->type & (AR_T_RAID1 | AR_T_RAID01))
@@ -860,6 +867,7 @@
 		printf("ar%d: WARNING - parity", rdp->lun);
 	    printf(" protection lost. %s array in DEGRADED mode\n",
 		   ata_raid_type(rdp));
+            writeback = 1;
 	}
     }
     mtx_unlock(&rdp->lock);
@@ -2233,11 +2242,16 @@
     }
 
     rdp->generation++;
-    microtime(&timestamp);
+
+    /* Generate a new config_id if none exists */
+    if (!rdp->magic_0) {
+        microtime(&timestamp);
+	rdp->magic_0 = timestamp.tv_sec ^ timestamp.tv_usec;
+    } 
 
     bcopy(INTEL_MAGIC, meta->intel_id, sizeof(meta->intel_id));
     bcopy(INTEL_VERSION_1100, meta->version, sizeof(meta->version));
-    meta->config_id = timestamp.tv_sec;
+    meta->config_id = rdp->magic_0;
     meta->generation = rdp->generation;
     meta->total_disks = rdp->total_disks;
     meta->total_volumes = 1;                                    /* XXX SOS */


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list