ZFS read performance disparity between clone and parent
Nathan Weeks
weeks at iastate.edu
Wed May 13 18:54:23 UTC 2015
While troubleshooting performance disparities between development and
production jails hosting PostgreSQL instances, I noticed (with the help of
dtruss) that the 8k read() performance in the production jail was an order of
magnitude worse than the read() performance in the development jail. As the
ZFS file system hosting the production jail was cloned from a snapshot of the
development jail, and had not been modified, this didn't make sense to me.
Using "dd" command with an 8k block size to emulate the PostgreSQL read()
size, I observed a large performance difference between reading one of the
large (1G) underlying postgres database files in the development jail's file
system vs. the corresponding file in the cloned file system:
# dd if=/jails/dev/usr/local/pgsql/data/base/16399/16436 of=/dev/null bs=8192
131072+0 records in
131072+0 records out
1073741824 bytes transferred in 4.198993 secs (255714128 bytes/sec)
# dd if=/jails/prod/usr/local/pgsql/data/base/16399/16436 of=/dev/null bs=8192
131072+0 records in
131072+0 records out
1073741824 bytes transferred in 17.314135 secs (62015331 bytes/sec)
# ls -l /jails/dev/usr/local/pgsql/data/base/16399/16436
/jails/prod/usr/local/pgsql/data/base/16399/16436
-rw------- 1 70 70 1073741824 Feb 5 16:41
/jails/dev/usr/local/pgsql/data/base/16399/16436
-rw------- 1 70 70 1073741824 Feb 5 16:41
/jails/prod/usr/local/pgsql/data/base/16399/16436
I repeated this exercise several times to verify the read performance
difference. Interestingly, prefixing the "dd" command with "/usr/bin/time -l"
revealed that in both cases, "block input operations" was 0, apparently
indicating that both files were being read from cache. In neither case did
"zpool iostat 1" show significant I/O being performed during the execution of
the "dd" command.
Has anyone else encountered a similar issue, and know of an
explanation/solution/better workaround? I had previously assumed that there
would be no performance difference between reading a file on a ZFS file system
and the corresponding file on a cloned file system when none of the data
blocks have changed (this is FreeBSD 9.3, so the "Single Copy ARC" feature
should apply). Dedup isn't being used on any file system.
The output of zfs-stats follows; I can provide any additional info that might
be of use in identifying the cause of this issue.
------------------------------------------------------------------------
ZFS Subsystem Report Wed May 13 12:22:00 2015
------------------------------------------------------------------------
System Information:
Kernel Version: 903000 (osreldate)
Hardware Platform: amd64
Processor Architecture: amd64
ZFS Storage pool Version: 5000
ZFS Filesystem Version: 5
FreeBSD 9.3-RELEASE-p5 #0: Mon Nov 3 22:38:58 UTC 2014 root
12:22PM up 166 days, 3:36, 7 users, load averages: 2.34, 2.31, 2.17
------------------------------------------------------------------------
System Memory:
8.83% 21.95 GiB Active, 1.67% 4.14 GiB Inact
68.99% 171.40 GiB Wired, 0.40% 1.00 GiB Cache
20.10% 49.93 GiB Free, 0.01% 16.12 MiB Gap
Real Installed: 256.00 GiB
Real Available: 99.99% 255.97 GiB
Real Managed: 97.06% 248.43 GiB
Logical Total: 256.00 GiB
Logical Used: 78.49% 200.92 GiB
Logical Free: 21.51% 55.08 GiB
Kernel Memory: 117.28 GiB
Data: 99.98% 117.25 GiB
Text: 0.02% 26.07 MiB
Kernel Memory Map: 241.10 GiB
Size: 43.83% 105.67 GiB
Free: 56.17% 135.43 GiB
------------------------------------------------------------------------
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 143.56m
Recycle Misses: 275.73m
Mutex Misses: 1.50m
Evict Skips: 20.24b
ARC Size: 99.77% 127.71 GiB
Target Size: (Adaptive) 100.00% 128.00 GiB
Min Size (Hard Limit): 12.50% 16.00 GiB
Max Size (High Water): 8:1 128.00 GiB
ARC Size Breakdown:
Recently Used Cache Size: 68.86% 88.15 GiB
Frequently Used Cache Size: 31.14% 39.85 GiB
ARC Hash Breakdown:
Elements Max: 27.87m
Elements Current: 40.13% 11.18m
Collisions: 1.95b
Chain Max: 26
Chains: 2.44m
------------------------------------------------------------------------
ARC Efficiency: 88.77b
Cache Hit Ratio: 99.52% 88.34b
Cache Miss Ratio: 0.48% 426.00m
Actual Hit Ratio: 98.86% 87.76b
Data Demand Efficiency: 99.99% 58.75b
Data Prefetch Efficiency: 98.47% 1.08b
CACHE HITS BY CACHE LIST:
Anonymously Used: 0.21% 187.51m
Most Recently Used: 1.93% 1.71b
Most Frequently Used: 97.41% 86.05b
Most Recently Used Ghost: 0.04% 39.14m
Most Frequently Used Ghost: 0.41% 358.78m
CACHE HITS BY DATA TYPE:
Demand Data: 66.49% 58.74b
Prefetch Data: 1.21% 1.07b
Demand Metadata: 31.74% 28.04b
Prefetch Metadata: 0.56% 491.01m
CACHE MISSES BY DATA TYPE:
Demand Data: 1.70% 7.26m
Prefetch Data: 3.89% 16.56m
Demand Metadata: 83.84% 357.15m
Prefetch Metadata: 10.57% 45.03m
------------------------------------------------------------------------
L2ARC is disabled
------------------------------------------------------------------------
File-Level Prefetch: (HEALTHY)
DMU Efficiency: 187.26b
Hit Ratio: 82.21% 153.94b
Miss Ratio: 17.79% 33.32b
Colinear: 33.32b
Hit Ratio: 0.01% 3.35m
Miss Ratio: 99.99% 33.32b
Stride: 150.63b
Hit Ratio: 100.00% 150.63b
Miss Ratio: 0.00% 453.04k
DMU Misc:
Reclaim: 33.32b
Successes: 0.36% 118.64m
Failures: 99.64% 33.20b
Streams: 3.31b
+Resets: 0.00% 20.36k
-Resets: 100.00% 3.31b
Bogus: 0
------------------------------------------------------------------------
VDEV cache is disabled
------------------------------------------------------------------------
ZFS Tunables (sysctl):
kern.maxusers 16718
vm.kmem_size 266754412544
vm.kmem_size_scale 1
vm.kmem_size_min 0
vm.kmem_size_max 329853485875
vfs.zfs.l2c_only_size 0
vfs.zfs.mfu_ghost_data_lsize 63695688192
vfs.zfs.mfu_ghost_metadata_lsize 8300248064
vfs.zfs.mfu_ghost_size 71995936256
vfs.zfs.mfu_data_lsize 34951425024
vfs.zfs.mfu_metadata_lsize 4976638976
vfs.zfs.mfu_size 41843978240
vfs.zfs.mru_ghost_data_lsize 41844330496
vfs.zfs.mru_ghost_metadata_lsize 23598693888
vfs.zfs.mru_ghost_size 65443024384
vfs.zfs.mru_data_lsize 67918019072
vfs.zfs.mru_metadata_lsize 411918848
vfs.zfs.mru_size 71823354880
vfs.zfs.anon_data_lsize 0
vfs.zfs.anon_metadata_lsize 0
vfs.zfs.anon_size 29893120
vfs.zfs.l2arc_norw 1
vfs.zfs.l2arc_feed_again 1
vfs.zfs.l2arc_noprefetch 1
vfs.zfs.l2arc_feed_min_ms 200
vfs.zfs.l2arc_feed_secs 1
vfs.zfs.l2arc_headroom 2
vfs.zfs.l2arc_write_boost 8388608
vfs.zfs.l2arc_write_max 8388608
vfs.zfs.arc_meta_limit 34359738368
vfs.zfs.arc_meta_used 34250008792
vfs.zfs.arc_min 17179869184
vfs.zfs.arc_max 137438953472
vfs.zfs.dedup.prefetch 1
vfs.zfs.mdcomp_disable 0
vfs.zfs.nopwrite_enabled 1
vfs.zfs.zfetch.array_rd_sz 1048576
vfs.zfs.zfetch.block_cap 256
vfs.zfs.zfetch.min_sec_reap 2
vfs.zfs.zfetch.max_streams 8
vfs.zfs.prefetch_disable 0
vfs.zfs.no_scrub_prefetch 0
vfs.zfs.no_scrub_io 0
vfs.zfs.resilver_min_time_ms 3000
vfs.zfs.free_min_time_ms 1000
vfs.zfs.scan_min_time_ms 1000
vfs.zfs.scan_idle 50
vfs.zfs.scrub_delay 4
vfs.zfs.resilver_delay 2
vfs.zfs.top_maxinflight 32
vfs.zfs.write_to_degraded 0
vfs.zfs.mg_noalloc_threshold 0
vfs.zfs.condense_pct 200
vfs.zfs.metaslab.weight_factor_enable 0
vfs.zfs.metaslab.preload_enabled 1
vfs.zfs.metaslab.preload_limit 3
vfs.zfs.metaslab.unload_delay 8
vfs.zfs.metaslab.load_pct 50
vfs.zfs.metaslab.min_alloc_size 10485760
vfs.zfs.metaslab.df_free_pct 4
vfs.zfs.metaslab.df_alloc_threshold 131072
vfs.zfs.metaslab.debug_unload 0
vfs.zfs.metaslab.debug_load 0
vfs.zfs.metaslab.gang_bang 131073
vfs.zfs.check_hostid 1
vfs.zfs.spa_asize_inflation 24
vfs.zfs.deadman_enabled 1
vfs.zfs.deadman_checktime_ms 5000
vfs.zfs.deadman_synctime_ms 1000000
vfs.zfs.recover 0
vfs.zfs.txg.timeout 5
vfs.zfs.min_auto_ashift 9
vfs.zfs.max_auto_ashift 13
vfs.zfs.vdev.cache.bshift 16
vfs.zfs.vdev.cache.size 0
vfs.zfs.vdev.cache.max 16384
vfs.zfs.vdev.trim_on_init 1
vfs.zfs.vdev.write_gap_limit 4096
vfs.zfs.vdev.read_gap_limit 32768
vfs.zfs.vdev.aggregation_limit 131072
vfs.zfs.vdev.scrub_max_active 2
vfs.zfs.vdev.scrub_min_active 1
vfs.zfs.vdev.async_write_max_active 10
vfs.zfs.vdev.async_write_min_active 1
vfs.zfs.vdev.async_read_max_active 3
vfs.zfs.vdev.async_read_min_active 1
vfs.zfs.vdev.sync_write_max_active 10
vfs.zfs.vdev.sync_write_min_active 10
vfs.zfs.vdev.sync_read_max_active 10
vfs.zfs.vdev.sync_read_min_active 10
vfs.zfs.vdev.max_active 1000
vfs.zfs.vdev.bio_delete_disable 0
vfs.zfs.vdev.bio_flush_disable 0
vfs.zfs.vdev.trim_max_pending 64
vfs.zfs.vdev.trim_max_bytes 2147483648
vfs.zfs.cache_flush_disable 0
vfs.zfs.zil_replay_disable 0
vfs.zfs.sync_pass_rewrite 2
vfs.zfs.sync_pass_dont_compress 5
vfs.zfs.sync_pass_deferred_free 2
vfs.zfs.zio.use_uma 0
vfs.zfs.snapshot_list_prefetch 0
vfs.zfs.version.ioctl 3
vfs.zfs.version.zpl 5
vfs.zfs.version.spa 5000
vfs.zfs.version.acl 1
vfs.zfs.debug 0
vfs.zfs.super_owner 0
vfs.zfs.trim.enabled 1
vfs.zfs.trim.max_interval 1
vfs.zfs.trim.timeout 30
vfs.zfs.trim.txg_delay 32
------------------------------------------------------------------------
--
Nathan Weeks
USDA-ARS Corn Insects and Crop Genetics Research Unit
Crop Genome Informatics Laboratory
Iowa State University
http://weeks.public.iastate.edu/
More information about the freebsd-fs
mailing list