FreeBSD 7.3-Stable / GEOM issue with ZFS attach/replace & zvol's...
Karl Pielorz
kpielorz_lst at tdx.co.uk
Thu Jul 8 19:10:16 UTC 2010
Hi All,
I posted a few days ago in -fs and -hackers - but never got any reply. I've
done some digging around now, and been able to reproduce the problem below
on another machine (by sending my ZFS zvol's & snapshots to it).
I'm running 7.3-STABLE on an amd64, w/10Gb of RAM, and 2 * dual core
Opteron 285's.
In a nutshell: A zfs attach/replace (or similar) on my system results in
GEOM iterating through all the 'drives' on the system (which is apparently
normal). When it encounters some of my ZFS volume snapshots (which are GELI
encrypted) it appears to 'hang' and the zfs attach/replace never completes.
Remove the snapshot it hangs on - and it hangs on another. Remove all the
snapshots/volumes - and the ZFS command completes without issue.
At the moment this is stopping me from replacing a failing drive which is
part of a zpool mirror set :(
e.g. With GEOM debugging turned on, I get:
host# zfs attach vol ad34 ad40
"
[GEOM complains the guid for ad40 doesn't match what it wants - and then
starts iterating through all the disk devices one after another... The guid
mismatch appears 'normal' - i.e. it always happens - even on working
systems]
Jul 5 19:42:50 host kernel:
g_access(0xffffff0035015380(zvol/vol2/zfs_backups/scanned), 1, 0, 0)
Jul 5 19:42:50 host kernel: open delta:[r1w0e0] old:[r0w0e0]
provider:[r0w0e0] 0xffffff000e1fd000(zvol/vol2/zfs_backups/scanned)
Jul 5 19:42:50 host kernel:
g_access(0xffffff0035015380(zvol/vol2/zfs_backups/scanned), -1, 0, 0)
Jul 5 19:42:50 host kernel: open delta:[r-1w0e0] old:[r1w0e0]
provider:[r1w0e0] 0xffffff000e1fd000(zvol/vol2/zfs_backups/scanned)
Jul 5 19:42:50 host kernel: g_detach(0xffffff0035015380)
Jul 5 19:42:50 host kernel:
g_access(0xffffff0035015380(zvol/vol/scanned at 1237495449), 1, 0, 0)
Jul 5 19:42:50 host kernel: open delta:[r1w0e0] old:[r0w0e0]
provider:[r0w0e0] 0xffffff000e60b300(zvol/vol/scanned at 1237495449)
**** ZFS [hangs here] - as does anything that subsequently touches ZFS ***
"
ps axl at that point shows:
"
0 2250 2004 0 -8 0 14460 2044 g_wait D+ p0 0:00.01 zpool
attach vol ad34 ad40
"
So it appears to be hung in 'g_wait'.
If I then reboot, and do:
"zfs destroy vol/scanned at 1237495449"
Then try the attach again - it hangs on another snapshot of 'vol/scanned'
(e.g. 'vol/scanned at 1274617895') next time round.
If I destroy all of them:
"zfs destroy -r vol/scanned"
The attach completes without issue. All those snapshots can be dd'd from
without issue (or mounted when attached via GELI etc.) - none of the
snapshots or GELI volumes are mounted when I do the attach/replace.
zpool status, and an ls of '/dev/zvol/vol' are below.
It *looks* like GEOM is seeing something it doesn't like, and hanging?
The system has worked fine for coming up to a year with ZFS - I have
replaced/attached drives in the past - but that was under 7.2-Stable.
Is there any additional GEOM debugging I can enable? (or any possible
workarounds - i.e. something I can do to get GEOM to ignore the ZVol's?)
-Karl
zpool status:
pool: vol
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
vol ONLINE 0 0 0
mirror ONLINE 0 0 0
ad28 ONLINE 0 0 0
ad12 ONLINE 0 0 0
mirror ONLINE 0 0 0
ad14 ONLINE 0 0 0
ad30 ONLINE 0 0 0
mirror ONLINE 0 0 0
ad16 ONLINE 0 0 0
ad32 ONLINE 0 0 0
mirror ONLINE 0 0 0
ad18 ONLINE 0 0 0
ad34 ONLINE 0 0 0
mirror ONLINE 0 0 0
ad20 ONLINE 0 0 0
ad36 ONLINE 0 0 0
mirror ONLINE 0 0 0
ad22 ONLINE 0 0 0
ad38 ONLINE 0 0 0
spares
ad42 AVAIL
(ad40 is also spare - but not linked to any pools)
ls /dev/zvol/vol
crw-r----- 1 root operator 0, 162 Jul 5 19:55 scanned
crw-r----- 1 root operator 0, 172 Jul 5 19:55 scanned at 1237495449
crw-r----- 1 root operator 0, 164 Jul 5 19:55 scanned at 1238970339
crw-r----- 1 root operator 0, 167 Jul 5 19:55 scanned at 1239143782
crw-r----- 1 root operator 0, 165 Jul 5 19:55 scanned at 1244575946
crw-r----- 1 root operator 0, 163 Jul 5 19:55 scanned at 1247670305
crw-r----- 1 root operator 0, 168 Jul 5 19:55 scanned at 1251063149
crw-r----- 1 root operator 0, 166 Jul 5 19:55 scanned at 1256072040
crw-r----- 1 root operator 0, 169 Jul 5 19:55 scanned at 1259364830
crw-r----- 1 root operator 0, 170 Jul 5 19:55 scanned at 1267226353
crw-r----- 1 root operator 0, 171 Jul 5 19:55 scanned at 1274617895
crw-r----- 1 root operator 0, 195 Jul 5 19:55 scanned at 1278362753
More information about the freebsd-geom
mailing list