kern/169480: [zfs] ZFS stalls on heavy I/O
Jeremy Chadwick
jdc at koitsu.org
Sat Jan 26 04:00:01 UTC 2013
The following reply was made to PR kern/169480; it has been noted by GNATS.
From: Jeremy Chadwick <jdc at koitsu.org>
To: Harry Coin <hgcoin at gmail.com>
Cc: bug-followup at FreeBSD.org, levent.serinol at mynet.com
Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O
Date: Fri, 25 Jan 2013 19:55:26 -0800
Recommendations:
1. Instead of /dev/random use /dev/zero. /dev/random is not blazing
fast given it has to harvest lots of entropy from places. If you're
doing I/O speed testing just use /dev/zero. The speed difference is
quite big.
2. For dd, instead of bs=512 use bs=64k. bs=512 isn't very ideal; these
are direct I/O writes of 512 bytes each, which is dog slow. I repeat:
dog slow. Linux does this differently.
3. During the dd, in another VTY or window, use "gstat -I500ms" and
watch the I/O speeds for your ada[2345] disks during the dd. They
should be hitting peaks between 60-150MBytes/sec under the far right
"Kbps" field (far left=read, far right=write).
The large potential speed variance has to do with how much data you
already have on the pool, i.e. MHDDs get slower as the actuator arms
move inward towards the spindle motor. That's why you might see, for
example, 150MBytes/sec when reading/writing to low-numbered LBAs but
slower speeds when writing to high-numbered LBAs.
This speed will be "bursty" and "sporadic" due to the how ZFS ARC
works. The interval at which "things are flushed to disk" is based on
the vfs.zfs.txg.timeout sysctl, which on FreeBSD 9.1-RELEASE should
default to 5 (5 seconds).
4. "zpool iostat -v {pool}" does not provide accurate speed indications
for the same reason "iostat" doesn't show ""valid"" (it does but not
what most people would hope for) information while "iostat 1" would.
You need to run it with an interval, i.e. "zpool iostat -v {pool} 1" and
let it run for a while while doing I/O. But I recommend using gstat
like I said, simply because the interval can be set at 500ms (0.5s) and
you get a better idea of what your peak I/O speed is.
If you find a single disk that is **always** performing badly, then that
disk is your bottleneck and I can help you with analysis of its problem.
5. Your "zpool scrub" speed being 14MBytes/second indicates you are no
where close to your ideal I/O speed. It should not be that slow unless
you're doing tons of I/O at the same time as the scrub. Also, scrubs
take longer now due to the disabling of the vdev cache (and that's not a
FreeBSD thing, it's that way in Illumos too, and it's a sensitive topic
to discuss).
6. On FreeBSD 9.1-RELEASE generally speaking you should not have to tune
any sysctls. The situation was different in 8.x and 9.0. Your system
only has 4GB of RAM so prefetching automatically gets disabled, by the
way, just in case you were wondering about that (there were problems
with prefetch in older releases).
7. You should probably keep "top -s 1" running, and you might even
consider using "top -S -s 1" to see system/kernel threads (they're in
brackets). This isn't going to tell you downright what's making things
slow though. "vmstat -i" during heavy I/O would be useful too, just in
case somehow you have a shared interrupt that's being pegged hard (for
example I've seen SATA controllers and USB controllers sharing an
interrupt, even with APICs, where the USB layer is busted churning out
1000 ints/sec and thus affecting SATA I/O speed).
8. If you want to compare systems I'm happy to do so, although I have
less disks than you do (3 in raidz1, WD Red 1TB drives). However my
system is not a Pentium D-class processor; it's a Core 2 Quad Q9500.
The D-class stuff is fairly old.
I have some other theories as well but one thing at a time.
--
| Jeremy Chadwick jdc at koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-fs
mailing list