zfs & waiting on zio->io_cv
Dan Nelson
dnelson at allantgroup.com
Fri Oct 24 15:09:19 UTC 2008
In the last episode (Oct 24), Danny Braniss said:
> there is a big delay (probably more than 1 sec.) when doing simple tasks
> on this zfs, like ls(1), or 'zfs list', long enough to hit ^T
> and get the same [zio->io_cv)], any hints?
>
> store-01# zfs list
> (hitting ^T)load: 0.00 cmd: zfs 88376 [zio->io_cv)] 0.00u 0.00s 0% 1672k
> (hitting ^T)load: 0.00 cmd: zfs 88376 [zio->io_cv)] 0.00u 0.00s 0% 1684k
> NAME USED AVAIL REFER MOUNTPOINT
> h 472G 11.2T 23K /h
> h/home 466G 11.2T 466G /h/home
> h/home at 23-10-08 54K - 466G -
> h/root 18K 11.2T 18K /h/root
> h/src 18K 11.2T 18K /h/src
> h/system 5.64G 11.2T 5.64G /h/system
That's sort of the equivalent to waiting in "biord" on a UFS
filesystem, I think. ZFS is just waiting for the disk to return a
block. If you happen to do something during the window where ZFS is
commiting its transaction group, it has to wait until the sync
finishes. If some other process is doing a lot of writes, or you only
have one disk in your zpool, or your pool is close to full, it may take
a couple seconds to sync.
There's a couple of things you can try to improve interactive
performance. Raising zfs's arc_max is the easiest to do, and will let
ZFS cache more stuff, increasing the likelyhood that an "ls" will be
able to read from cache instead of having to go to disk. Setting it at
1/4 your physical RAM is probably as high as you can go without causing
panics.
Raising txg_time ( in /sys/cddl/.../zfs/txg.c ) from 5 to
say 30 will tell zfs to sync less often, which can be a win if you
don't actually do that much writing. With a single spindle, it may
take a substantial fraction of a second just to sync a tiny txg due to
the number of copies of metadata ZFS writes for redundancy.
If you do a lot of writing, lowering zfs_vdev_max_pending ( in
/sys/cddl/.../zfs/vdev_queue.c ) from 35 down to 16 or less will reduce
the number of simultaneous I/Os ZFS will try to send to each disk,
which will let your reads compete a little better with other I/O. On
ATA or SATA disks, you might want to set it to 2.
--
Dan Nelson
dnelson at allantgroup.com
More information about the freebsd-hackers
mailing list