Reoccuring ZFS performance problems
Adrian Gschwend
ml-ktk at netlabs.org
Mon Mar 10 17:12:18 UTC 2014
Hi group,
(I have a lot of pastes in here, see http://pastebin.com/yjQnLryP for
this email in case the mail kills too long lines)
On a regular base I run into some very weird ZFS performance issues on
my server. When it happens file IO is terribly slow and even a simple ls
can take a long time (worst case up to minutes). Everything which relies
on file IO is basically dead in this mode so even starting top or other
tools is a PITA. This stage can stay from minutes to hours and goes back
to normal after some random time. A reboot does not necessarily fix it,
often I'm back in exactly this stage after reboot.
I do not see any patterns when it happens in my monitoring (munin), in
terms of when munin starts to time out because of it I do not see peaks
upfront in any of the system graphs I do.
When I run 'top' in this mode I see many processes in one of these states:
tx->tx
zfs
umtxn (mainly on mysql, which is unkillable in this mode)
uwait
Setup:
* FreeBSD 9.2-RELEASE-p3 in a KVM (SmartOS Solaris host, running ZFS itself)
* I'm using mfsbsd to do a ZFS only system
* No specific ZFS changes (I did play with some, see last part of email)
* There are 5 jails running
* I run various Apaches (PHP/SVN/TRAC etc), MySQL, LDAP daemon, a JVM
and some SIP servers (Freeswitch)
* Normal load is around 20-30% (2 cores)
* Swap is currently on 1% usage (4G available)
* I have 16GB of memory available, munin still shows around 1-2 GB as
free. It can be that the issue happens faster with less memory but I
cannot prove it either.
* Currently no dtrace enabled so can't get much further than standard
tools shipped with BSD
* zpool status does not report any failures
The issues are not new, they first appeared while the system was still
running on real hardware (FBSD 8.x) and not within a KVM. Back then I
assumed I have a hardware problem but the problem reappeared on the
virtualized install. This install was basically a 1:1 zfs send copy plus
some bootloader hacking of the old system so exact same software levels.
I switched to a new install on 9.x and had the issue on every single
release there as well. I did not try 10.x yet.
fstat | wc -l:
7068
(took forever to let it finish)
gstat gives me:
dT: 1.010s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
10 575 1 0 0.2 574 653 17.8 100.4| vtbd0
0 0 0 0 0.0 0 0 0.0 0.0|
PART/vtbd0/vtbd0
0 0 0 0 0.0 0 0 0.0 0.0| vtbd0p1
0 0 0 0 0.0 0 0 0.0 0.0| vtbd0p2
10 575 1 0 0.3 574 653 18.0 100.4| vtbd0p3
0 0 0 0 0.0 0 0 0.0 0.0|
DEV/vtbd0/vtbd0
0 0 0 0 0.0 0 0 0.0 0.0| cd0
0 0 0 0 0.0 0 0 0.0 0.0|
DEV/vtbd0p1/vtbd0p1
0 0 0 0 0.0 0 0 0.0 0.0|
LABEL/vtbd0p1/vtbd0p1
0 0 0 0 0.0 0 0 0.0 0.0|
gptid/e402ecce-89ca-11e2-a867-3264262b9894
0 0 0 0 0.0 0 0 0.0 0.0|
DEV/vtbd0p2/vtbd0p2
0 0 0 0 0.0 0 0 0.0 0.0|
LABEL/vtbd0p2/vtbd0p2
0 0 0 0 0.0 0 0 0.0 0.0|
gptid/e4112d88-89ca-11e2-a867-3264262b9894
0 0 0 0 0.0 0 0 0.0 0.0|
DEV/vtbd0p3/vtbd0p3
0 0 0 0 0.0 0 0 0.0 0.0|
SWAP/swap/gptid/e4112d88-89ca-11e2-a867-3264262b9894
0 0 0 0 0.0 0 0 0.0 0.0| DEV/cd0/cd0
0 0 0 0 0.0 0 0 0.0 0.0|
DEV/gptid/e402ecce-89ca-11e2-a867-3264262b9894/gptid/e402ecce-89ca
-11e2-a867-3264262b9894
0 0 0 0 0.0 0 0 0.0 0.0|
DEV/gptid/e4112d88-89ca-11e2-a867-3264262b9894/gptid/e4112d88-89ca
-11e2-a867-3264262b9894
0 0 0 0 0.0 0 0 0.0 0.0|
ZFS::VDEV/zfs::vdev/vtbd0p3
ms/w changes a lot, highest I've seen right now was around 70
zfs iostat 2:
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
tank 173G 21.5G 3 56 30.9K 535K
tank 173G 21.5G 62 376 49.0K 1.22M
tank 173G 21.5G 46 340 84.3K 565K
tank 173G 21.5G 45 566 74.6K 800K
tank 173G 21.5G 32 222 92.0K 958K
tank 173G 21.5G 63 392 120K 1.10M
tank 173G 21.5G 16 286 14.2K 338K
tank 173G 21.5G 29 313 24.6K 831K
tank 173G 21.5G 0 289 0 445K
tank 173G 21.5G 27 244 32.6K 293K
tank 173G 21.5G 43 385 42.8K 477K
tank 173G 21.5G 31 329 15.7K 710K
tank 173G 21.5G 65 394 46.8K 1.50M
tank 173G 21.5G 80 320 127K 754K
tank 173G 21.5G 30 425 144K 1.09M
tank 173G 21.5G 13 399 25.9K 379K
tank 173G 21.5G 10 194 5.22K 162K
tank 173G 21.5G 18 311 45.5K 1.02M
tank 173G 21.5G 29 202 58.5K 344K
tank 173G 21.5G 32 375 108K 926K
on the host OS (SmartOS) zpool iostat 2 shows me (there is one other
FBSD box but there is almost no IO on this one):
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
zones 602G 230G 5 278 44.0K 2.42M
zones 602G 230G 2 154 19.9K 2.18M
zones 602G 230G 0 1.47K 0 10.3M
zones 602G 230G 0 128 0 1.44M
zones 602G 230G 0 270 0 2.61M
zones 602G 230G 0 1.39K 0 10.2M
zones 602G 230G 0 114 7.96K 2.10M
zones 602G 230G 0 979 7.96K 7.84M
When the guest system is not in this state the writes are lower and I
don't see modes where I have 0 on reads.
I was googling around about this topic forever and I do find people who
report similar issues. No one I've contacted found a real explanation
for it. Based on various guides I started adapting the basic config:
cat /boot/loader.conf:
vfs.zfs.zfetch.block_cap=64
# this one was horrible, bootup alone was dogslow
#vfs.zfs.write_limit_override=1048576
#vfs.zfs.txg.timeout="5"
# so far good results?
vfs.zfs.prefetch_disable="1"
First I thought disabling prefetch did solve the issue for a while. But
it looks like I was too optimistic with that one. However, ls feels
*much* faster when the system is happy since I disabled prefetch.
I'm really totally lost on this one so I would appreciate hints about
how to debug that.
I'm willing to test whatever it takes to figure out where this issue is.
thanks
Adrian
More information about the freebsd-fs
mailing list