kern/125413: Panic when doing zfs raidz with gmirror and ggate

Tue Jul 8 19:40:12 UTC 2008

>Number:         125413
>Category:       kern
>Synopsis:       Panic when doing zfs raidz with gmirror and ggate
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jul 08 19:40:06 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Javier Martín Rueda
>Release:        FreeBSD 7.0-STABLE
>Organization:
DIATEL - UPM
>Environment:
FreeBSD fuego2.pruebas.local 7.0-STABLE FreeBSD 7.0-STABLE #0: Thu Jul  3 17:21:29 CEST 2008     root at fuego2.pruebas.local:/usr/src/sys/i386/compile/REPLICACION  i386

>Description:
I have two FreeBSD machines with 8 disks each. I am trying to create a replicated raidz ZFS pool using gmirror and ggate. I export the disks on one of the machines with ggate, and then create 8 gmirrors on the other one, each with two providers (the local disk, and the correspondig remote ggate disk). To clarify, this is the output of gmirror status:

# gmirror status
      Name    Status  Components
mirror/gm0  COMPLETE  ggate0
                      da0
mirror/gm1  COMPLETE  ggate1
                      da1
mirror/gm2  COMPLETE  ggate2
                      da2
mirror/gm3  COMPLETE  ggate3
                      da3
mirror/gm4  COMPLETE  ggate4
                      da4
mirror/gm5  COMPLETE  ggate5
                      da5
mirror/gm6  COMPLETE  ggate6
                      da6
mirror/gm7  COMPLETE  ggate7
                      da7

Now, if I create a non-raidz zpool, everything is fine:

# zpool create z1 mirror/gm0 mirror/gm1 mirror/gm2 mirror/gm3 mirror/gm4 mirror/gm5 mirror/gm6 mirror/gm7
# zpool status
  pool: z1
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        z1            ONLINE       0     0     0
          mirror/gm0  ONLINE       0     0     0
          mirror/gm1  ONLINE       0     0     0
          mirror/gm2  ONLINE       0     0     0
          mirror/gm3  ONLINE       0     0     0
          mirror/gm4  ONLINE       0     0     0
          mirror/gm5  ONLINE       0     0     0
          mirror/gm6  ONLINE       0     0     0
          mirror/gm7  ONLINE       0     0     0

errors: No known data errors

However, if I try to create a pool with raidz or raidz2, I get a panic. The sentence that causes the page fault is vdev_geom.c:420, when it tries to access a null pointer.

If I create the gmirrors with just the local disk as provider, there  is no panic in any case (raidz or raidz2). So, it seems that ggate has something to do with it, or more likely it unhides a problem somewhere else.

All this happened also with 7.0-RELEASE.

I have been looking at the code for a while, and the sequence of function calls that triggers the panic is this:

1) For some reason, zio_vdev_io_assess() tells SPA to reopen the vdev

2) vdev_reopen() calls vdev_close(), and then it will call vdev_open()

2.1) vdev_close() queues several events to close the 8 devices, but returns before they have been completely closed. The subsequent call to vdev_open() finds the devices still there and reuses them. However, eventually the events from vdev_close will dettach them, and that's when the problem comes, because suddenly a provider that was there vanishes. It looks like a race condition.

>How-To-Repeat:
It is explained in detail above. Summarizing:

The servers are fuego1 and fuego2. Both have 8 data disks (da0 - da7), appart from the system disks. Execute the following on fuego1:

ggatec create -u 0 fuego2 /dev/da0
ggatec create -u 1 fuego2 /dev/da1
ggatec create -u 2 fuego2 /dev/da2
ggatec create -u 3 fuego2 /dev/da3
ggatec create -u 4 fuego2 /dev/da4
ggatec create -u 5 fuego2 /dev/da5
ggatec create -u 6 fuego2 /dev/da6
ggatec create -u 7 fuego2 /dev/da7
gmirror label -h -b prefer gm0 da0 ggate0
gmirror label -h -b prefer gm1 da1 ggate1
gmirror label -h -b prefer gm2 da2 ggate2
gmirror label -h -b prefer gm3 da3 ggate3
gmirror label -h -b prefer gm4 da4 ggate4
gmirror label -h -b prefer gm5 da5 ggate5
gmirror label -h -b prefer gm6 da6 ggate6
gmirror label -h -b prefer gm7 da7 ggate7
zpool create z1 raidz2 mirror/gm0 mirror/gm1 mirror/gm2 mirror/gm3 mirror/gm4 mirror/gm5 mirror/gm6 mirror/gm7

And you'll get a panic.

>Fix:
I don't know a good fix, but I attach a shoddy patch that seems to work (and reinforces my belief that it is a race condition). Basically, I insert a delay between vdev_close and vdev_open in vdev_reopen, so that by the time vdev_open gets called all the closes have been completely finished.


Patch attached with submission follows:

--- vdev.c      2008-04-17 03:23:33.000000000 +0200
+++ /tmp/vdev.c 2008-07-08 21:27:35.000000000 +0200
@@ -1023,6 +1023,7 @@
        ASSERT(spa_config_held(spa, RW_WRITER));

        vdev_close(vd);
+       pause("chapuza", 2000);
        (void) vdev_open(vd);

        /*


>Release-Note:
>Audit-Trail:
>Unformatted: