ZFS performance of various vdevs (long post)

Tue Jun 8 04:47:09 UTC 2010

On Mon, Jun 07, 2010 at 05:32:18PM -0700, Bradley W. Dutton wrote:
> Quoting Bob Friesenhahn <bfriesen at simple.dallas.tx.us>:
> 
> >On Mon, 7 Jun 2010, Bradley W. Dutton wrote:
> >>So the normal vdev performs closest to raw drive speeds. Raidz1
> >>is slower and raidz2 even more so. This is observable in the dd
> >>tests and viewing in gstat. Any ideas why the raid numbers are
> >>slower? I've tried to account for the fact that the raid vdevs
> >>have fewer data disks. Would a faster CPU help here?
> >
> >The sequential throughput on your new drives is faster than the
> >old drives, but it is likely that the seek and rotational
> >latencies are longer.  ZFS is transaction-oriented and must tell
> >all the drives to sync their write cache before proceeding to the
> >next transaction group.  Drives with more latency will slow down
> >this step.  Likewise, ZFS always reads and writes full filesystem
> >blocks (default 128K) and this may cause more overhead when using
> >raidz.
> 
> The details are little lacking on the Hitachi site but the
> HDS722020ALA330 says 8.2 seek time.
> http://www.hitachigst.com/tech/techlib.nsf/techdocs/5F2DC3B35EA0311386257634000284AD/$file/USA7K2000_DS7K2000_OEMSpec_r1.2.pdf
> 
> The WDC drives say 8.9 so we should be in the same ballpark on seek times.
> http://www.wdc.com/en/products/products.asp?driveid=399
> 
> I thought the NCQ vs no NCQ might tip the scales in favor of the
> Hitachi array as well.

I'm not sure you understand NCQ.  What you're doing in your dd test is
individual dd's on each disk.  NCQ is a per-disk thing.  What you need
to test is multiple concurrent transactions *per disk*.  What I'm trying
to say is that NCQ vs. no-NCQ isn't the culprit here, because your
testbench model isn't making use of it.

> I know it's pretty simple but for checking throughput I thought it
> would be ok. I don't have compression on and based on the drive
> lights and gstat, the drives definitely aren't idle.

Try disabling prefetch (you have it enabled) and try setting
vfs.zfs.txg.timeout="5".  Some people have reported a "sweet spot" with
regards to the last parameter (needing to be adjusted if your disks are
extremely fast, etc.), as otherwise ZFS would be extremely "bursty" in
its I/O (stalling/deadlocking the system at set intervals).  By
decreasing the value you essentially do disk writes more regularly (with
less data), and depending upon the load and controller, this may even
out performance.

> >The higher CPU usage might be due to the device driver or the
> >interface card being used.
> 
> Definitely a plausible explanation. If this was the case would the 8
> parallel dd processes exhibit the same behavior? or is the type of
> IO affecting how much CPU the driver is using?

It would be the latter.

Also, I believe this Supermicro controller has been discussed in the
past.  I can't remember if people had outright failures/issues with it
or if people were complaining about sub-par performance.  I could also
be remembering a different Supermicro controller.

If I had to make a recommendation, it would be to reproduce the same
setup on a system using an Intel ICH9/ICH9R or ICH10/ICH10R controller
in AHCI mode (with ahci.ko loaded, not ataahci.ko) and see if things
improve.  But start with the loader.conf tunables I mentioned above --
segregate each test.

I would also recommend you re-run your tests with a different blocksize
for dd.  I don't know why people keep using 1m (Linux websites?).  Test
the following increments: 4k, 8k, 16k, 32k, 64k, 128k, 256k.  That's
about where you should stop.

Otherwise, consider installing ports/benchmarks/bonnie++ and try that.
That will also get you concurrent I/O tests, I believe.

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |