kern/125413: Panic when doing zfs raidz with gmirror and ggate
Javier Martín Rueda
jmrueda at diatel.upm.es
Tue Jul 8 19:40:12 UTC 2008
>Number: 125413
>Category: kern
>Synopsis: Panic when doing zfs raidz with gmirror and ggate
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Tue Jul 08 19:40:06 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator: Javier Martín Rueda
>Release: FreeBSD 7.0-STABLE
>Organization:
DIATEL - UPM
>Environment:
FreeBSD fuego2.pruebas.local 7.0-STABLE FreeBSD 7.0-STABLE #0: Thu Jul 3 17:21:29 CEST 2008 root at fuego2.pruebas.local:/usr/src/sys/i386/compile/REPLICACION i386
>Description:
I have two FreeBSD machines with 8 disks each. I am trying to create a replicated raidz ZFS pool using gmirror and ggate. I export the disks on one of the machines with ggate, and then create 8 gmirrors on the other one, each with two providers (the local disk, and the correspondig remote ggate disk). To clarify, this is the output of gmirror status:
# gmirror status
Name Status Components
mirror/gm0 COMPLETE ggate0
da0
mirror/gm1 COMPLETE ggate1
da1
mirror/gm2 COMPLETE ggate2
da2
mirror/gm3 COMPLETE ggate3
da3
mirror/gm4 COMPLETE ggate4
da4
mirror/gm5 COMPLETE ggate5
da5
mirror/gm6 COMPLETE ggate6
da6
mirror/gm7 COMPLETE ggate7
da7
Now, if I create a non-raidz zpool, everything is fine:
# zpool create z1 mirror/gm0 mirror/gm1 mirror/gm2 mirror/gm3 mirror/gm4 mirror/gm5 mirror/gm6 mirror/gm7
# zpool status
pool: z1
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
z1 ONLINE 0 0 0
mirror/gm0 ONLINE 0 0 0
mirror/gm1 ONLINE 0 0 0
mirror/gm2 ONLINE 0 0 0
mirror/gm3 ONLINE 0 0 0
mirror/gm4 ONLINE 0 0 0
mirror/gm5 ONLINE 0 0 0
mirror/gm6 ONLINE 0 0 0
mirror/gm7 ONLINE 0 0 0
errors: No known data errors
However, if I try to create a pool with raidz or raidz2, I get a panic. The sentence that causes the page fault is vdev_geom.c:420, when it tries to access a null pointer.
If I create the gmirrors with just the local disk as provider, there is no panic in any case (raidz or raidz2). So, it seems that ggate has something to do with it, or more likely it unhides a problem somewhere else.
All this happened also with 7.0-RELEASE.
I have been looking at the code for a while, and the sequence of function calls that triggers the panic is this:
1) For some reason, zio_vdev_io_assess() tells SPA to reopen the vdev
2) vdev_reopen() calls vdev_close(), and then it will call vdev_open()
2.1) vdev_close() queues several events to close the 8 devices, but returns before they have been completely closed. The subsequent call to vdev_open() finds the devices still there and reuses them. However, eventually the events from vdev_close will dettach them, and that's when the problem comes, because suddenly a provider that was there vanishes. It looks like a race condition.
>How-To-Repeat:
It is explained in detail above. Summarizing:
The servers are fuego1 and fuego2. Both have 8 data disks (da0 - da7), appart from the system disks. Execute the following on fuego1:
ggatec create -u 0 fuego2 /dev/da0
ggatec create -u 1 fuego2 /dev/da1
ggatec create -u 2 fuego2 /dev/da2
ggatec create -u 3 fuego2 /dev/da3
ggatec create -u 4 fuego2 /dev/da4
ggatec create -u 5 fuego2 /dev/da5
ggatec create -u 6 fuego2 /dev/da6
ggatec create -u 7 fuego2 /dev/da7
gmirror label -h -b prefer gm0 da0 ggate0
gmirror label -h -b prefer gm1 da1 ggate1
gmirror label -h -b prefer gm2 da2 ggate2
gmirror label -h -b prefer gm3 da3 ggate3
gmirror label -h -b prefer gm4 da4 ggate4
gmirror label -h -b prefer gm5 da5 ggate5
gmirror label -h -b prefer gm6 da6 ggate6
gmirror label -h -b prefer gm7 da7 ggate7
zpool create z1 raidz2 mirror/gm0 mirror/gm1 mirror/gm2 mirror/gm3 mirror/gm4 mirror/gm5 mirror/gm6 mirror/gm7
And you'll get a panic.
>Fix:
I don't know a good fix, but I attach a shoddy patch that seems to work (and reinforces my belief that it is a race condition). Basically, I insert a delay between vdev_close and vdev_open in vdev_reopen, so that by the time vdev_open gets called all the closes have been completely finished.
Patch attached with submission follows:
--- vdev.c 2008-04-17 03:23:33.000000000 +0200
+++ /tmp/vdev.c 2008-07-08 21:27:35.000000000 +0200
@@ -1023,6 +1023,7 @@
ASSERT(spa_config_held(spa, RW_WRITER));
vdev_close(vd);
+ pause("chapuza", 2000);
(void) vdev_open(vd);
/*
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list