kern/169480: [zfs] ZFS stalls on heavy I/O

Jeremy Chadwick jdc at koitsu.org
Sat Jan 26 02:10:01 UTC 2013


The following reply was made to PR kern/169480; it has been noted by GNATS.

From: Jeremy Chadwick <jdc at koitsu.org>
To: bug-followup at FreeBSD.org
Cc:  
Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O
Date: Fri, 25 Jan 2013 18:08:59 -0800

 Harry, things that come to mind immediately:
 
 1. http://www.quietfountain.com/fs1pool1.txt is when your pool contained
 both L2ARC and ZIL devices on SSDs.  Please remove the SSDs from the
 picture entirely and use raidz1 disks ada[2345] only at this point.
 
 I do not want to discuss ada[01] at this time, because they're SSDs.
 There are quite literally 4 or 5 "catches" to using these devices on
 FreeBSD ZFS, but the biggest problem -- and this WILL hurt you, no
 arguments about it -- is lack of TRIM support.  You will hurt your SSDs
 over time doing this.  If you want TRIM support on ZFS you will need to
 run -CURRENT.
 
 We can talk more about the SSDs later.  As said, please remove them from
 the pictures for starters, as all they do is make troubleshooting much
 much harder.
 
 2. I do see some raw I/O benchmarks but only for ada2.  This is
 insufficient.  A single disk performing like crap in a pool can slow
 down the entire response time for everything.  I can do analysis of all
 of your disks if the issue is narrowed down to one of them.
 
 "gstat -I500ms" is a good way to watch I/O speeds in real-time.  I find
 this more effective than "zpool iostat -v 1" for per-device info.
 
 3. The ada[2345] disks involved are Hitachi HDS723015BLA642 (7K3000,
 1.5TB), and there is sparse info on the web as to if these are
 512-byte physical sector disks or 4096-byte.  smartmontools 6.0 or
 newer will tell you.
 
 All disks regardless advertise 512-byte as the logical size to remain
 fully compatible with legacy systems, but the perform hit on I/O is
 major if the device + pool ashift isn't 12.  So please check this with
 smartmontools 6.0 or newer.
 
 If the disks use physically 4096-byte sectors, you need to use gnop(8)
 to align them and create the pool off of that.  Ivan Voras wrote a
 wonderful guide on how to do this, and it's very simple:
 
 http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html
 
 It wouldn't hurt you to do this regardless, as there's no performance
 hit using the gnop(8) method on 512-byte sector drives; this would
 "future-proof" you if upgrading to newer disks too.  You want ashift=12.
 
 4. Why are all of your drives partitioned?  In other words, why are you
 using adaXpX rather than just adaX for your raidz1 pool?  "gpart show"
 output was not provided, and I can only speculate as to what's going on
 under the hood there.
 
 Please use raw disks when recreating your pool, i.e. ada2, ada3, ada4,
 etc...
 
 I know for your cache/logs this is a different situation but again,
 please remove those from the picture.
 
 5. Please keep your Hitachi disks on the Intel ICH7 controller for the
 time being.  It's SATA300 but that isn't going to hurt these disks.
 Don't bring the Marvell into the picture yet.  Don't change around
 cabling or anything else.
 
 6. For any process that takes a long while, you're going to need to do
 "procstat -kk" (yes -k twice) against it.
 
 7. I do not think your issue is related to this PR.  I would suggest
 discussing it on freebsd-fs first.  Of course, you're also using
 something called "nas4free" which may or may not be *true, unaltered*
 FreeBSD -- I have no idea.  I often find it frustrating when, say, the
 FreeNAS folks or other "FreeBSD fork projects" appear on the FreeBSD
 mailing lists "just because it uses FreeBSD".  You always have to go
 with the vendor for support (like you did on their forum), but if you
 really think this is a FreeBSD "kernel thing" freebsd-fs is fine.
 
 Start with what I described above and go from there.
 
 -- 
 | Jeremy Chadwick                                   jdc at koitsu.org |
 | UNIX Systems Administrator                http://jdc.koitsu.org/ |
 | Mountain View, CA, US                                            |
 | Making life hard for others since 1977.             PGP 4BD6C0CB |
 


More information about the freebsd-fs mailing list