kern/169480: [zfs] ZFS stalls on heavy I/O
Jeremy Chadwick
jdc at koitsu.org
Sat Jan 26 02:10:01 UTC 2013
The following reply was made to PR kern/169480; it has been noted by GNATS.
From: Jeremy Chadwick <jdc at koitsu.org>
To: bug-followup at FreeBSD.org
Cc:
Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O
Date: Fri, 25 Jan 2013 18:08:59 -0800
Harry, things that come to mind immediately:
1. http://www.quietfountain.com/fs1pool1.txt is when your pool contained
both L2ARC and ZIL devices on SSDs. Please remove the SSDs from the
picture entirely and use raidz1 disks ada[2345] only at this point.
I do not want to discuss ada[01] at this time, because they're SSDs.
There are quite literally 4 or 5 "catches" to using these devices on
FreeBSD ZFS, but the biggest problem -- and this WILL hurt you, no
arguments about it -- is lack of TRIM support. You will hurt your SSDs
over time doing this. If you want TRIM support on ZFS you will need to
run -CURRENT.
We can talk more about the SSDs later. As said, please remove them from
the pictures for starters, as all they do is make troubleshooting much
much harder.
2. I do see some raw I/O benchmarks but only for ada2. This is
insufficient. A single disk performing like crap in a pool can slow
down the entire response time for everything. I can do analysis of all
of your disks if the issue is narrowed down to one of them.
"gstat -I500ms" is a good way to watch I/O speeds in real-time. I find
this more effective than "zpool iostat -v 1" for per-device info.
3. The ada[2345] disks involved are Hitachi HDS723015BLA642 (7K3000,
1.5TB), and there is sparse info on the web as to if these are
512-byte physical sector disks or 4096-byte. smartmontools 6.0 or
newer will tell you.
All disks regardless advertise 512-byte as the logical size to remain
fully compatible with legacy systems, but the perform hit on I/O is
major if the device + pool ashift isn't 12. So please check this with
smartmontools 6.0 or newer.
If the disks use physically 4096-byte sectors, you need to use gnop(8)
to align them and create the pool off of that. Ivan Voras wrote a
wonderful guide on how to do this, and it's very simple:
http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html
It wouldn't hurt you to do this regardless, as there's no performance
hit using the gnop(8) method on 512-byte sector drives; this would
"future-proof" you if upgrading to newer disks too. You want ashift=12.
4. Why are all of your drives partitioned? In other words, why are you
using adaXpX rather than just adaX for your raidz1 pool? "gpart show"
output was not provided, and I can only speculate as to what's going on
under the hood there.
Please use raw disks when recreating your pool, i.e. ada2, ada3, ada4,
etc...
I know for your cache/logs this is a different situation but again,
please remove those from the picture.
5. Please keep your Hitachi disks on the Intel ICH7 controller for the
time being. It's SATA300 but that isn't going to hurt these disks.
Don't bring the Marvell into the picture yet. Don't change around
cabling or anything else.
6. For any process that takes a long while, you're going to need to do
"procstat -kk" (yes -k twice) against it.
7. I do not think your issue is related to this PR. I would suggest
discussing it on freebsd-fs first. Of course, you're also using
something called "nas4free" which may or may not be *true, unaltered*
FreeBSD -- I have no idea. I often find it frustrating when, say, the
FreeNAS folks or other "FreeBSD fork projects" appear on the FreeBSD
mailing lists "just because it uses FreeBSD". You always have to go
with the vendor for support (like you did on their forum), but if you
really think this is a FreeBSD "kernel thing" freebsd-fs is fine.
Start with what I described above and go from there.
--
| Jeremy Chadwick jdc at koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-fs
mailing list