zfs & waiting on zio->io_cv
Danny Braniss
danny at cs.huji.ac.il
Sat Oct 25 07:48:17 UTC 2008
> In the last episode (Oct 24), Danny Braniss said:
> > there is a big delay (probably more than 1 sec.) when doing simple tasks
> > on this zfs, like ls(1), or 'zfs list', long enough to hit ^T
> > and get the same [zio->io_cv)], any hints?
> >
> > store-01# zfs list
> > (hitting ^T)load: 0.00 cmd: zfs 88376 [zio->io_cv)] 0.00u 0.00s 0% 1672k
> > (hitting ^T)load: 0.00 cmd: zfs 88376 [zio->io_cv)] 0.00u 0.00s 0% 1684k
> > NAME USED AVAIL REFER MOUNTPOINT
> > h 472G 11.2T 23K /h
> > h/home 466G 11.2T 466G /h/home
> > h/home at 23-10-08 54K - 466G -
> > h/root 18K 11.2T 18K /h/root
> > h/src 18K 11.2T 18K /h/src
> > h/system 5.64G 11.2T 5.64G /h/system
>
> That's sort of the equivalent to waiting in "biord" on a UFS
> filesystem, I think. ZFS is just waiting for the disk to return a
> block. If you happen to do something during the window where ZFS is
> commiting its transaction group, it has to wait until the sync
> finishes. If some other process is doing a lot of writes, or you only
> have one disk in your zpool, or your pool is close to full, it may take
> a couple seconds to sync.
>
> There's a couple of things you can try to improve interactive
> performance. Raising zfs's arc_max is the easiest to do, and will let
> ZFS cache more stuff, increasing the likelyhood that an "ls" will be
> able to read from cache instead of having to go to disk. Setting it at
> 1/4 your physical RAM is probably as high as you can go without causing
> panics.
>
> Raising txg_time ( in /sys/cddl/.../zfs/txg.c ) from 5 to
> say 30 will tell zfs to sync less often, which can be a win if you
> don't actually do that much writing. With a single spindle, it may
> take a substantial fraction of a second just to sync a tiny txg due to
> the number of copies of metadata ZFS writes for redundancy.
>
> If you do a lot of writing, lowering zfs_vdev_max_pending ( in
> /sys/cddl/.../zfs/vdev_queue.c ) from 35 down to 16 or less will reduce
> the number of simultaneous I/Os ZFS will try to send to each disk,
> which will let your reads compete a little better with other I/O. On
> ATA or SATA disks, you might want to set it to 2.
>
ok, forgot to mention a small detail, the machine is a cuad core, with 8gb
of main memory, the disks are 14x1tb connected via a perc/raid5
tests show that disk access is quiet fast, over 200Mg/s.
the 'delays' are seen when the machine is totaly idle. (it's not production
yet)
and been up for some time. btw, I can't reproduce the 'delay', so I think
it has to do with caching.
I guess this beast needs some tunning, are there any tools out there
to monitor/tune ZFS?
thanks,
danny
> --
> Dan Nelson
> dnelson at allantgroup.com
More information about the freebsd-hackers
mailing list