FreeBSD 7.3-Stable / GEOM issue with ZFS attach/replace & zvol's...

Karl Pielorz kpielorz_lst at tdx.co.uk
Thu Jul 8 19:10:16 UTC 2010


Hi All,

I posted a few days ago in -fs and -hackers - but never got any reply. I've 
done some digging around now, and been able to reproduce the problem below 
on another machine (by sending my ZFS zvol's & snapshots to it).

I'm running 7.3-STABLE on an amd64, w/10Gb of RAM, and 2 * dual core 
Opteron 285's.

In a nutshell: A zfs attach/replace (or similar) on my system results in 
GEOM iterating through all the 'drives' on the system (which is apparently 
normal). When it encounters some of my ZFS volume snapshots (which are GELI 
encrypted) it appears to 'hang' and the zfs attach/replace never completes.

Remove the snapshot it hangs on - and it hangs on another. Remove all the 
snapshots/volumes - and the ZFS command completes without issue.

At the moment this is stopping me from replacing a failing drive which is 
part of a zpool mirror set :(

e.g. With GEOM debugging turned on, I get:

host# zfs attach vol ad34 ad40

"
[GEOM complains the guid for ad40 doesn't match what it wants - and then 
starts iterating through all the disk devices one after another... The guid 
mismatch appears 'normal' - i.e. it always happens - even on working 
systems]

Jul  5 19:42:50 host kernel: 
g_access(0xffffff0035015380(zvol/vol2/zfs_backups/scanned), 1, 0, 0)
Jul  5 19:42:50 host kernel: open delta:[r1w0e0] old:[r0w0e0] 
provider:[r0w0e0] 0xffffff000e1fd000(zvol/vol2/zfs_backups/scanned)
Jul  5 19:42:50 host kernel: 
g_access(0xffffff0035015380(zvol/vol2/zfs_backups/scanned), -1, 0, 0)
Jul  5 19:42:50 host kernel: open delta:[r-1w0e0] old:[r1w0e0] 
provider:[r1w0e0] 0xffffff000e1fd000(zvol/vol2/zfs_backups/scanned)
Jul  5 19:42:50 host kernel: g_detach(0xffffff0035015380)
Jul  5 19:42:50 host kernel: 
g_access(0xffffff0035015380(zvol/vol/scanned at 1237495449), 1, 0, 0)
Jul  5 19:42:50 host kernel: open delta:[r1w0e0] old:[r0w0e0] 
provider:[r0w0e0] 0xffffff000e60b300(zvol/vol/scanned at 1237495449)
**** ZFS [hangs here] - as does anything that subsequently touches ZFS ***
"

ps axl at that point shows:

"
 0  2250  2004   0  -8  0 14460  2044 g_wait D+    p0    0:00.01 zpool 
attach vol ad34 ad40
"

So it appears to be hung in 'g_wait'.


If I then reboot, and do:

"zfs destroy vol/scanned at 1237495449"

Then try the attach again - it hangs on another snapshot of 'vol/scanned' 
(e.g. 'vol/scanned at 1274617895') next time round.

If I destroy all of them:

"zfs destroy -r vol/scanned"

The attach completes without issue. All those snapshots can be dd'd from 
without issue (or mounted when attached via GELI etc.) - none of the 
snapshots or GELI volumes are mounted when I do the attach/replace.

zpool status, and an ls of '/dev/zvol/vol' are below.

It *looks* like GEOM is seeing something it doesn't like, and hanging?

The system has worked fine for coming up to a year with ZFS - I have 
replaced/attached drives in the past - but that was under 7.2-Stable.

Is there any additional GEOM debugging I can enable? (or any possible 
workarounds - i.e. something I can do to get GEOM to ignore the ZVol's?)

-Karl


zpool status:

  pool: vol
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        vol         ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad28    ONLINE       0     0     0
            ad12    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad14    ONLINE       0     0     0
            ad30    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad16    ONLINE       0     0     0
            ad32    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad18    ONLINE       0     0     0
            ad34    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad20    ONLINE       0     0     0
            ad36    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad22    ONLINE       0     0     0
            ad38    ONLINE       0     0     0
        spares
          ad42      AVAIL
(ad40 is also spare - but not linked to any pools)


ls /dev/zvol/vol

crw-r-----  1 root  operator    0, 162 Jul  5 19:55 scanned
crw-r-----  1 root  operator    0, 172 Jul  5 19:55 scanned at 1237495449
crw-r-----  1 root  operator    0, 164 Jul  5 19:55 scanned at 1238970339
crw-r-----  1 root  operator    0, 167 Jul  5 19:55 scanned at 1239143782
crw-r-----  1 root  operator    0, 165 Jul  5 19:55 scanned at 1244575946
crw-r-----  1 root  operator    0, 163 Jul  5 19:55 scanned at 1247670305
crw-r-----  1 root  operator    0, 168 Jul  5 19:55 scanned at 1251063149
crw-r-----  1 root  operator    0, 166 Jul  5 19:55 scanned at 1256072040
crw-r-----  1 root  operator    0, 169 Jul  5 19:55 scanned at 1259364830
crw-r-----  1 root  operator    0, 170 Jul  5 19:55 scanned at 1267226353
crw-r-----  1 root  operator    0, 171 Jul  5 19:55 scanned at 1274617895
crw-r-----  1 root  operator    0, 195 Jul  5 19:55 scanned at 1278362753




More information about the freebsd-geom mailing list