kern/169480: [zfs] ZFS stalls on heavy I/O

Jeremy Chadwick jdc at koitsu.org
Sat Jan 26 04:00:01 UTC 2013


The following reply was made to PR kern/169480; it has been noted by GNATS.

From: Jeremy Chadwick <jdc at koitsu.org>
To: Harry Coin <hgcoin at gmail.com>
Cc: bug-followup at FreeBSD.org, levent.serinol at mynet.com
Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O
Date: Fri, 25 Jan 2013 19:55:26 -0800

 Recommendations:
 
 1. Instead of /dev/random use /dev/zero.  /dev/random is not blazing
 fast given it has to harvest lots of entropy from places.  If you're
 doing I/O speed testing just use /dev/zero.  The speed difference is
 quite big.
 
 2. For dd, instead of bs=512 use bs=64k.  bs=512 isn't very ideal; these
 are direct I/O writes of 512 bytes each, which is dog slow.  I repeat:
 dog slow.  Linux does this differently.
 
 3. During the dd, in another VTY or window, use "gstat -I500ms" and
 watch the I/O speeds for your ada[2345] disks during the dd.  They
 should be hitting peaks between 60-150MBytes/sec under the far right
 "Kbps" field (far left=read, far right=write).
 
 The large potential speed variance has to do with how much data you
 already have on the pool, i.e. MHDDs get slower as the actuator arms
 move inward towards the spindle motor.  That's why you might see, for
 example, 150MBytes/sec when reading/writing to low-numbered LBAs but
 slower speeds when writing to high-numbered LBAs.
 
 This speed will be "bursty" and "sporadic" due to the how ZFS ARC
 works.  The interval at which "things are flushed to disk" is based on
 the vfs.zfs.txg.timeout sysctl, which on FreeBSD 9.1-RELEASE should
 default to 5 (5 seconds).
 
 4. "zpool iostat -v {pool}" does not provide accurate speed indications
 for the same reason "iostat" doesn't show ""valid"" (it does but not
 what most people would hope for) information while "iostat 1" would.
 You need to run it with an interval, i.e. "zpool iostat -v {pool} 1" and
 let it run for a while while doing I/O.  But I recommend using gstat
 like I said, simply because the interval can be set at 500ms (0.5s) and
 you get a better idea of what your peak I/O speed is.
 
 If you find a single disk that is **always** performing badly, then that
 disk is your bottleneck and I can help you with analysis of its problem.
 
 5. Your "zpool scrub" speed being 14MBytes/second indicates you are no
 where close to your ideal I/O speed.  It should not be that slow unless
 you're doing tons of I/O at the same time as the scrub.  Also, scrubs
 take longer now due to the disabling of the vdev cache (and that's not a
 FreeBSD thing, it's that way in Illumos too, and it's a sensitive topic
 to discuss).
 
 6. On FreeBSD 9.1-RELEASE generally speaking you should not have to tune
 any sysctls.  The situation was different in 8.x and 9.0.  Your system
 only has 4GB of RAM so prefetching automatically gets disabled, by the
 way, just in case you were wondering about that (there were problems
 with prefetch in older releases).
 
 7. You should probably keep "top -s 1" running, and you might even
 consider using "top -S -s 1" to see system/kernel threads (they're in
 brackets).  This isn't going to tell you downright what's making things
 slow though.  "vmstat -i" during heavy I/O would be useful too, just in
 case somehow you have a shared interrupt that's being pegged hard (for
 example I've seen SATA controllers and USB controllers sharing an
 interrupt, even with APICs, where the USB layer is busted churning out
 1000 ints/sec and thus affecting SATA I/O speed).
 
 8. If you want to compare systems I'm happy to do so, although I have
 less disks than you do (3 in raidz1, WD Red 1TB drives).  However my
 system is not a Pentium D-class processor; it's a Core 2 Quad Q9500.
 The D-class stuff is fairly old.
 
 I have some other theories as well but one thing at a time.
 
 -- 
 | Jeremy Chadwick                                   jdc at koitsu.org |
 | UNIX Systems Administrator                http://jdc.koitsu.org/ |
 | Mountain View, CA, US                                            |
 | Making life hard for others since 1977.             PGP 4BD6C0CB |
 


More information about the freebsd-fs mailing list