Very bad ZFS performance on fresh FreeBSD 8 installation

Clément Moulin cmoulin at simplerezo.com
Tue May 11 21:52:54 UTC 2010


Hi

We have recently added 3 new disks to one of our storage servers (Opteron
DualCore 2.2 GHz, 2048 MB RAM).
We made a ZFS pool storage of this 3 disks (2 TB each) attached to a PCIX
SATA RAID controllers (3ware 9550SX-16ML), and made a fresh installation of
FreeBSD 8/amd64.

Disk performance other the ZFS storage is very bad (about 8-9 MB read or
write...).
And when machine has low FREE memory (lot of INACTIVE memory used), it's
even worst (about 2MB/s)...

~$ uname -a
FreeBSD ---.simplerezo.com 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Wed Apr
28 23:07:37 CEST 2010
root at ---.simplerezo.com:/usr/obj/usr/src/sys/KERNEL  amd64
# NOTE: tried with -RELEASE, running -STABLE (same results)

~$ zpool list
NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
tank  5,44T  1,26T  4,18T    23%  ONLINE  -
~$ zpool status
  pool: tank
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            da0p2   ONLINE       0     0     0
            da1p2   ONLINE       0     0     0
            da2p2   ONLINE       0     0     0

errors: No known data errors

~$ tw_cli /c0 show all
/c0 Driver Version = 3.70.05.001
/c0 Model = 9550SX-16ML
/c0 Available Memory = 224MB
/c0 Firmware Version = FE9X 3.08.00.029
/c0 Bios Version = BE9X 3.10.00.003
/c0 Boot Loader Version = BL9X 3.01.00.006
/c0 Serial Number = L021603A6120017
/c0 PCB Version = Rev 032
/c0 PCHIP Version = 1.60
/c0 ACHIP Version = 1.70
/c0 Number of Ports = 16
/c0 Number of Drives = 13
/c0 Number of Units = 7
/c0 Total Optimal Units = 7
/c0 Not Optimal Units = 0
/c0 JBOD Export Policy = off
/c0 Disk Spinup Policy = 1
/c0 Spinup Stagger Time Policy (sec) = 1
/c0 Auto-Carving Policy = off
/c0 Auto-Carving Size = 2048 GB
/c0 Auto-Rebuild Policy = on
/c0 Rebuild Rate = 1
/c0 Verify Rate = 1
/c0 Controller Bus Type = PCIX
/c0 Controller Bus Width = 64 bits
/c0 Controller Bus Speed = 133 Mhz

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache
AVrfy
----------------------------------------------------------------------------
--
u0    SINGLE    OK             -       -       -       1862.63   ON     OFF
u1    SINGLE    OK             -       -       -       1862.63   ON     OFF
u2    SINGLE    OK             -       -       -       1862.63   ON     OFF
u3    RAID-5    OK             -       -       64K     931.303   ON     OFF
u4    RAID-5    OK             -       -       64K     931.303   ON     OFF
u5    RAID-5    OK             -       -       64K     1396.96   ON     OFF
u6    SINGLE    OK             -       -       -       1862.63   ON     OFF

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     1.82 TB     3907029168    S1UYJ1BZ102991
p1     OK               u1     1.82 TB     3907029168    JK1170YAHU3KVP
p2     OK               u2     1.82 TB     3907029168    S1UYJ1LZ110057
p3     OK               u4     465.76 GB   976773168     KRVN33ZAHEKHXD
p4     OK               u4     465.76 GB   976773168     KRVN33ZAHE54XD
p5     OK               u4     465.76 GB   976773168     KRVN33ZAHDZPMD
p6     OK               u5     698.63 GB   1465149168    3QD08DVE
p7     OK               u5     698.63 GB   1465149168    3QD09JMS
p8     OK               u5     698.63 GB   1465149168    3QD09758
p9     NOT-PRESENT      -      -           -             -
p10    NOT-PRESENT      -      -           -             -
p11    OK               u3     465.76 GB   976773168     KRVN33ZAHG0DED
p12    OK               u3     465.76 GB   976773168     KRVN33ZAHAJ4ZD
p13    OK               u3     465.76 GB   976773168     KRVN33ZAH80K6D
p14    OK               u6     1.82 TB     3907029168    JK1120YAG2K1DP
p15    NOT-PRESENT      -      -           -             -

Name  OnlineState  BBUReady  Status    Volt     Temp     Hours  LastCapTest
---------------------------------------------------------------------------
bbu   On           Yes       OK        OK       OK       255    03-Oct-2009

~$ diskinfo -c -t /dev/da0
/dev/da0
        512             # sectorsize
        1999988850688   # mediasize in bytes (1.8T)
        3906228224      # mediasize in sectors
        243151          # Cylinders according to firmware.
        255             # Heads according to firmware.
        63              # Sectors according to firmware.
        BZ102991D7FFC800F1DE    # Disk ident.

I/O command overhead:
        time to read 10MB block      0.109465 sec       =    0.005
msec/sector
        time to read 20480 sectors   3.105226 sec       =    0.152
msec/sector
        calculated command overhead                     =    0.146
msec/sector

Seek times:
        Full stroke:      250 iter in   5.754512 sec =   23.018 msec
        Half stroke:      250 iter in   4.829476 sec =   19.318 msec
        Quarter stroke:   500 iter in   9.630453 sec =   19.261 msec
        Short forward:    400 iter in   5.849457 sec =   14.624 msec
        Short backward:   400 iter in   4.123671 sec =   10.309 msec
        Seq outer:       2048 iter in   0.222291 sec =    0.109 msec
        Seq inner:       2048 iter in   0.229290 sec =    0.112 msec
Transfer rates:
        outside:       102400 kbytes in   1.024872 sec =    99915 kbytes/sec
        middle:        102400 kbytes in   1.220958 sec =    83869 kbytes/sec
        inside:        102400 kbytes in   2.315101 sec =    44231 kbytes/sec


~$ diskinfo -c -t /dev/da1
/dev/da1
        512             # sectorsize
        1999988850688   # mediasize in bytes (1.8T)
        3906228224      # mediasize in sectors
        243151          # Cylinders according to firmware.
        255             # Heads according to firmware.
        63              # Sectors according to firmware.
        YAHU3KVPD7FFCD0054FA    # Disk ident.

I/O command overhead:
        time to read 10MB block      0.111094 sec       =    0.005
msec/sector
        time to read 20480 sectors   4.501331 sec       =    0.220
msec/sector
        calculated command overhead                     =    0.214
msec/sector

Seek times:
        Full stroke:      250 iter in   5.741269 sec =   22.965 msec
        Half stroke:      250 iter in   3.777884 sec =   15.112 msec
        Quarter stroke:   500 iter in   6.487416 sec =   12.975 msec
        Short forward:    400 iter in   3.129467 sec =    7.824 msec
        Short backward:   400 iter in   2.411493 sec =    6.029 msec
        Seq outer:       2048 iter in   0.419405 sec =    0.205 msec
        Seq inner:       2048 iter in   0.425003 sec =    0.208 msec
Transfer rates:
        outside:       102400 kbytes in   0.808851 sec =   126599 kbytes/sec
        middle:        102400 kbytes in   1.716277 sec =    59664 kbytes/sec
        inside:        102400 kbytes in   1.985879 sec =    51564 kbytes/sec

# NOTE: da2 not showed (same h/w as da1)

~$ dd if=/dev/zero of=test1G bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 133.956339 secs (8015610 bytes/sec)

# gstat and zpool iostat reports same values

# The same test done on the same machine (da6 is the similar disk as da1),
on an UFS partition (results is about 80 MB/s):
~$ dd if=/dev/zero of=test1G bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 12.543317 secs (85602701 bytes/sec)


~$ cat /boot/loader.conf
zfs_load="YES"
vfs.root.mountfrom="zfs:tank/root"
vfs.zfs.arc_max="64M"
vm.kmem_size="1024M"
vm.swap_enabled="0"
~$ sysctl vfs.zfs kstat.zfs
vfs.zfs.l2c_only_size: 0
vfs.zfs.mfu_ghost_data_lsize: 0
vfs.zfs.mfu_ghost_metadata_lsize: 0
vfs.zfs.mfu_ghost_size: 0
vfs.zfs.mfu_data_lsize: 0
vfs.zfs.mfu_metadata_lsize: 0
vfs.zfs.mfu_size: 114688
vfs.zfs.mru_ghost_data_lsize: 0
vfs.zfs.mru_ghost_metadata_lsize: 0
vfs.zfs.mru_ghost_size: 0
vfs.zfs.mru_data_lsize: 512
vfs.zfs.mru_metadata_lsize: 0
vfs.zfs.mru_size: 59429376
vfs.zfs.anon_data_lsize: 0
vfs.zfs.anon_metadata_lsize: 0
vfs.zfs.anon_size: 1656320
vfs.zfs.l2arc_noprefetch: 0
vfs.zfs.l2arc_feed_secs_shift: 1
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_headroom: 128
vfs.zfs.l2arc_write_boost: 67108864
vfs.zfs.l2arc_write_max: 67108864
vfs.zfs.arc_meta_limit: 16777216
vfs.zfs.arc_meta_used: 107541232
vfs.zfs.mdcomp_disable: 0
vfs.zfs.arc_min: 33554432
vfs.zfs.arc_max: 67108864
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.zfetch.block_cap: 256
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.prefetch_disable: 1
vfs.zfs.recover: 0
vfs.zfs.txg.synctime: 5
vfs.zfs.txg.timeout: 30
vfs.zfs.scrub_limit: 10
vfs.zfs.vdev.cache.bshift: 16
vfs.zfs.vdev.cache.size: 10485760
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.aggregation_limit: 131072
vfs.zfs.vdev.ramp_rate: 2
vfs.zfs.vdev.time_shift: 6
vfs.zfs.vdev.min_pending: 4
vfs.zfs.vdev.max_pending: 35
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zil_disable: 0
vfs.zfs.version.zpl: 3
vfs.zfs.version.vdev_boot: 1
vfs.zfs.version.spa: 13
vfs.zfs.version.dmu_backup_stream: 1
vfs.zfs.version.dmu_backup_header: 2
vfs.zfs.version.acl: 1
vfs.zfs.debug: 0
vfs.zfs.super_owner: 0
kstat.zfs.misc.arcstats.hits: 677531
kstat.zfs.misc.arcstats.misses: 805411
kstat.zfs.misc.arcstats.demand_data_hits: 524276
kstat.zfs.misc.arcstats.demand_data_misses: 570927
kstat.zfs.misc.arcstats.demand_metadata_hits: 153255
kstat.zfs.misc.arcstats.demand_metadata_misses: 234484
kstat.zfs.misc.arcstats.prefetch_data_hits: 0
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 0
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 0
kstat.zfs.misc.arcstats.mru_hits: 634920
kstat.zfs.misc.arcstats.mru_ghost_hits: 4103
kstat.zfs.misc.arcstats.mfu_hits: 42611
kstat.zfs.misc.arcstats.mfu_ghost_hits: 5985
kstat.zfs.misc.arcstats.allocated: 1132742
kstat.zfs.misc.arcstats.deleted: 839033
kstat.zfs.misc.arcstats.stolen: 378817
kstat.zfs.misc.arcstats.recycle_miss: 751590
kstat.zfs.misc.arcstats.mutex_miss: 457
kstat.zfs.misc.arcstats.evict_skip: 713728
kstat.zfs.misc.arcstats.hash_elements: 3639
kstat.zfs.misc.arcstats.hash_elements_max: 6958
kstat.zfs.misc.arcstats.hash_collisions: 146864
kstat.zfs.misc.arcstats.hash_chains: 195
kstat.zfs.misc.arcstats.hash_chain_max: 4
kstat.zfs.misc.arcstats.p: 33554432
kstat.zfs.misc.arcstats.c: 33554432
kstat.zfs.misc.arcstats.c_min: 33554432
kstat.zfs.misc.arcstats.c_max: 67108864
kstat.zfs.misc.arcstats.size: 107823344
kstat.zfs.misc.arcstats.hdr_size: 764608
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 0
kstat.zfs.misc.arcstats.l2_write_trylock_fail: 0
kstat.zfs.misc.arcstats.l2_write_passed_headroom: 0
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0
kstat.zfs.misc.arcstats.l2_write_in_l2: 0
kstat.zfs.misc.arcstats.l2_write_io_in_progress: 0
kstat.zfs.misc.arcstats.l2_write_not_cacheable: 0
kstat.zfs.misc.arcstats.l2_write_full: 0
kstat.zfs.misc.arcstats.l2_write_buffer_iter: 0
kstat.zfs.misc.arcstats.l2_write_pios: 0
kstat.zfs.misc.arcstats.l2_write_bytes_written: 0
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 0
kstat.zfs.misc.vdev_cache_stats.delegations: 3113
kstat.zfs.misc.vdev_cache_stats.hits: 342284
kstat.zfs.misc.vdev_cache_stats.misses: 50877

...

--
Clément Moulin       SimpleRezo
www.simplerezo.com





More information about the freebsd-fs mailing list