ZFS Write Lockup
Dave Cundiff
syshackmin at gmail.com
Tue Oct 4 07:06:44 UTC 2011
Hi,
I'm running 8.2-RELEASE and running into an IO lockup on ZFS that is
happening pretty regularly. The system is stock except for the
following set in loader.conf
vm.kmem_size="30G"
vfs.zfs.arc_max="22G"
kern.hz=100
I know the kmem settings aren't SUPPOSED to be necessary now, buy my
ZFS boxes were crashing until I added them. The machine has 24 gigs of
RAM. The kern.hz=100 was to stretch out the l2arc bug that pops up at
28days with it set to 1000.
[root at san2 ~]# zpool status
pool: san
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
san ONLINE 0 0 0
da1 ONLINE 0 0 0
logs
mirror ONLINE 0 0 0
ad6s1b ONLINE 0 0 0
ad14s1b ONLINE 0 0 0
cache
ad6s1d ONLINE 0 0 0
ad14s1d ONLINE 0 0 0
errors: No known data errors
Here's a zpool iostat from a machine in trouble.
san 9.08T 3.55T 0 0 0 7.92K
san 9.08T 3.55T 0 447 0 5.77M
san 9.08T 3.55T 0 309 0 2.83M
san 9.08T 3.55T 0 0 0 0
san 9.08T 3.55T 62 0 2.22M 0
san 9.08T 3.55T 0 2 0 23.5K
san 9.08T 3.55T 0 0 0 0
san 9.08T 3.55T 0 0 0 0
san 9.08T 3.55T 0 254 0 6.62M
san 9.08T 3.55T 0 249 0 3.16M
san 9.08T 3.55T 0 0 0 0
san 9.08T 3.55T 34 0 491K 0
san 9.08T 3.55T 0 6 0 62.7K
san 9.08T 3.55T 0 0 0 0
san 9.08T 3.55T 0 85 0 6.59M
san 9.08T 3.55T 0 0 0 0
san 9.08T 3.55T 0 452 0 4.88M
san 9.08T 3.55T 109 0 3.12M 0
san 9.08T 3.55T 0 0 0 0
san 9.08T 3.55T 0 0 0 7.84K
san 9.08T 3.55T 0 434 0 6.41M
san 9.08T 3.55T 0 0 0 0
san 9.08T 3.55T 0 304 0 2.90M
san 9.08T 3.55T 37 0 628K 0
Its supposed to look like
san 9.07T 3.56T 162 167 3.75M 6.09M
san 9.07T 3.56T 5 0 47.4K 0
san 9.07T 3.56T 19 0 213K 0
san 9.07T 3.56T 120 0 3.26M 0
san 9.07T 3.56T 92 0 741K 0
san 9.07T 3.56T 114 0 2.86M 0
san 9.07T 3.56T 72 0 579K 0
san 9.07T 3.56T 14 0 118K 0
san 9.07T 3.56T 24 0 213K 0
san 9.07T 3.56T 25 0 324K 0
san 9.07T 3.56T 8 0 126K 0
san 9.07T 3.56T 28 0 505K 0
san 9.07T 3.56T 15 0 126K 0
san 9.07T 3.56T 11 0 158K 0
san 9.07T 3.56T 19 0 356K 0
san 9.07T 3.56T 198 0 3.55M 0
san 9.07T 3.56T 21 0 173K 0
san 9.07T 3.56T 18 0 150K 0
san 9.07T 3.56T 23 0 260K 0
san 9.07T 3.56T 9 0 78.3K 0
san 9.07T 3.56T 21 0 173K 0
san 9.07T 3.56T 2 4.59K 16.8K 142M
san 9.07T 3.56T 12 0 103K 0
san 9.07T 3.56T 26 454 312K 4.35M
san 9.07T 3.56T 111 0 3.34M 0
san 9.07T 3.56T 28 0 870K 0
san 9.07T 3.56T 75 0 3.88M 0
san 9.07T 3.56T 43 0 1.22M 0
san 9.07T 3.56T 26 0 270K 0
I don't know what triggers the problem but I know how to fix it. If I
perform a couple snapshot deletes the IO will come back in line every
single time. Fortunately I have LOTS of snapshots to delete.
[root at san2 ~]# zfs list -r -t snapshot | wc -l
5236
[root at san2 ~]# zfs list -r -t volume | wc -l
17
Being fairly new to FreeBSD and ZFS I'm pretty clueless on where to
begin tracking this down. I've been staring at gstat trying to see if
a zvol is getting a big burst of writes that may be flooding the drive
controller but I haven't caught anything yet. top -S -H shows
zio_write_issue threads consuming massive amounts of CPU during the
lockup. Normally they sit around 5-10%. Any suggestions on where I
could start to track this down would be greatly appreciated.
Thanks,
--
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com
More information about the freebsd-questions
mailing list