ZFS read performance disparity between clone and parent
Matthew Ahrens
mahrens at delphix.com
Sat Jun 13 03:50:45 UTC 2015
On Wed, May 13, 2015 at 11:54 AM, Nathan Weeks <weeks at iastate.edu> wrote:
> While troubleshooting performance disparities between development and
> production jails hosting PostgreSQL instances, I noticed (with the help of
> dtruss) that the 8k read() performance in the production jail was an order
> of
> magnitude worse than the read() performance in the development jail. As the
> ZFS file system hosting the production jail was cloned from a snapshot of
> the
> development jail, and had not been modified, this didn't make sense to me.
>
> Using "dd" command with an 8k block size to emulate the PostgreSQL read()
> size, I observed a large performance difference between reading one of the
> large (1G) underlying postgres database files in the development jail's
> file
> system vs. the corresponding file in the cloned file system:
>
> # dd if=/jails/dev/usr/local/pgsql/data/base/16399/16436 of=/dev/null
> bs=8192
> 131072+0 records in
> 131072+0 records out
> 1073741824 bytes transferred in 4.198993 secs (255714128 bytes/sec)
> # dd if=/jails/prod/usr/local/pgsql/data/base/16399/16436 of=/dev/null
> bs=8192
> 131072+0 records in
> 131072+0 records out
> 1073741824 bytes transferred in 17.314135 secs (62015331 bytes/sec)
> # ls -l /jails/dev/usr/local/pgsql/data/base/16399/16436
> /jails/prod/usr/local/pgsql/data/base/16399/16436
> -rw------- 1 70 70 1073741824 Feb 5 16:41
> /jails/dev/usr/local/pgsql/data/base/16399/16436
> -rw------- 1 70 70 1073741824 Feb 5 16:41
> /jails/prod/usr/local/pgsql/data/base/16399/16436
>
> I repeated this exercise several times to verify the read performance
> difference. Interestingly, prefixing the "dd" command with "/usr/bin/time
> -l"
> revealed that in both cases, "block input operations" was 0, apparently
> indicating that both files were being read from cache. In neither case did
> "zpool iostat 1" show significant I/O being performed during the execution
> of
> the "dd" command.
>
> Has anyone else encountered a similar issue, and know of an
> explanation/solution/better workaround? I had previously assumed that there
> would be no performance difference between reading a file on a ZFS file
> system
> and the corresponding file on a cloned file system when none of the data
> blocks have changed (this is FreeBSD 9.3, so the "Single Copy ARC" feature
> should apply). Dedup isn't being used on any file system.
>
An unfortunate byproduct of the "single copy ARC" is that the first dataset
to read a block performs better than subsequent readers, which have to do
an extra bcopy() of the block. You should be able to alleviate this by
evicting the buffers by unmounting the first filesystem or running "zinject
-a". We are working on a fix for this as part of the "compressed ARC"
feature that will be coming soon.
You can verify this by looking at the flame graphs of CPU usage in both
cases.(http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html)
--matt
>
> The output of zfs-stats follows; I can provide any additional info that
> might
> be of use in identifying the cause of this issue.
>
> ------------------------------------------------------------------------
> ZFS Subsystem Report Wed May 13 12:22:00 2015
> ------------------------------------------------------------------------
>
> System Information:
>
> Kernel Version: 903000 (osreldate)
> Hardware Platform: amd64
> Processor Architecture: amd64
>
> ZFS Storage pool Version: 5000
> ZFS Filesystem Version: 5
>
> FreeBSD 9.3-RELEASE-p5 #0: Mon Nov 3 22:38:58 UTC 2014 root
> 12:22PM up 166 days, 3:36, 7 users, load averages: 2.34, 2.31, 2.17
>
> ------------------------------------------------------------------------
>
> System Memory:
>
> 8.83% 21.95 GiB Active, 1.67% 4.14 GiB Inact
> 68.99% 171.40 GiB Wired, 0.40% 1.00 GiB Cache
> 20.10% 49.93 GiB Free, 0.01% 16.12 MiB Gap
>
> Real Installed: 256.00 GiB
> Real Available: 99.99% 255.97 GiB
> Real Managed: 97.06% 248.43 GiB
>
> Logical Total: 256.00 GiB
> Logical Used: 78.49% 200.92 GiB
> Logical Free: 21.51% 55.08 GiB
>
> Kernel Memory: 117.28 GiB
> Data: 99.98% 117.25 GiB
> Text: 0.02% 26.07 MiB
>
> Kernel Memory Map: 241.10 GiB
> Size: 43.83% 105.67 GiB
> Free: 56.17% 135.43 GiB
>
> ------------------------------------------------------------------------
>
> ARC Summary: (HEALTHY)
> Memory Throttle Count: 0
>
> ARC Misc:
> Deleted: 143.56m
> Recycle Misses: 275.73m
> Mutex Misses: 1.50m
> Evict Skips: 20.24b
>
> ARC Size: 99.77% 127.71 GiB
> Target Size: (Adaptive) 100.00% 128.00 GiB
> Min Size (Hard Limit): 12.50% 16.00 GiB
> Max Size (High Water): 8:1 128.00 GiB
>
> ARC Size Breakdown:
> Recently Used Cache Size: 68.86% 88.15 GiB
> Frequently Used Cache Size: 31.14% 39.85 GiB
>
> ARC Hash Breakdown:
> Elements Max: 27.87m
> Elements Current: 40.13% 11.18m
> Collisions: 1.95b
> Chain Max: 26
> Chains: 2.44m
>
> ------------------------------------------------------------------------
>
> ARC Efficiency: 88.77b
> Cache Hit Ratio: 99.52% 88.34b
> Cache Miss Ratio: 0.48% 426.00m
> Actual Hit Ratio: 98.86% 87.76b
>
> Data Demand Efficiency: 99.99% 58.75b
> Data Prefetch Efficiency: 98.47% 1.08b
>
> CACHE HITS BY CACHE LIST:
> Anonymously Used: 0.21% 187.51m
> Most Recently Used: 1.93% 1.71b
> Most Frequently Used: 97.41% 86.05b
> Most Recently Used Ghost: 0.04% 39.14m
> Most Frequently Used Ghost: 0.41% 358.78m
>
> CACHE HITS BY DATA TYPE:
> Demand Data: 66.49% 58.74b
> Prefetch Data: 1.21% 1.07b
> Demand Metadata: 31.74% 28.04b
> Prefetch Metadata: 0.56% 491.01m
>
> CACHE MISSES BY DATA TYPE:
> Demand Data: 1.70% 7.26m
> Prefetch Data: 3.89% 16.56m
> Demand Metadata: 83.84% 357.15m
> Prefetch Metadata: 10.57% 45.03m
>
> ------------------------------------------------------------------------
>
> L2ARC is disabled
>
> ------------------------------------------------------------------------
>
> File-Level Prefetch: (HEALTHY)
>
> DMU Efficiency: 187.26b
> Hit Ratio: 82.21% 153.94b
> Miss Ratio: 17.79% 33.32b
>
> Colinear: 33.32b
> Hit Ratio: 0.01% 3.35m
> Miss Ratio: 99.99% 33.32b
>
> Stride: 150.63b
> Hit Ratio: 100.00% 150.63b
> Miss Ratio: 0.00% 453.04k
>
> DMU Misc:
> Reclaim: 33.32b
> Successes: 0.36% 118.64m
> Failures: 99.64% 33.20b
>
> Streams: 3.31b
> +Resets: 0.00% 20.36k
> -Resets: 100.00% 3.31b
> Bogus: 0
>
> ------------------------------------------------------------------------
>
> VDEV cache is disabled
>
> ------------------------------------------------------------------------
>
> ZFS Tunables (sysctl):
> kern.maxusers 16718
> vm.kmem_size 266754412544
> vm.kmem_size_scale 1
> vm.kmem_size_min 0
> vm.kmem_size_max 329853485875
> vfs.zfs.l2c_only_size 0
> vfs.zfs.mfu_ghost_data_lsize 63695688192
> vfs.zfs.mfu_ghost_metadata_lsize 8300248064
> vfs.zfs.mfu_ghost_size 71995936256
> vfs.zfs.mfu_data_lsize 34951425024
> vfs.zfs.mfu_metadata_lsize 4976638976
> vfs.zfs.mfu_size 41843978240
> vfs.zfs.mru_ghost_data_lsize 41844330496
> vfs.zfs.mru_ghost_metadata_lsize 23598693888
> vfs.zfs.mru_ghost_size 65443024384
> vfs.zfs.mru_data_lsize 67918019072
> vfs.zfs.mru_metadata_lsize 411918848
> vfs.zfs.mru_size 71823354880
> vfs.zfs.anon_data_lsize 0
> vfs.zfs.anon_metadata_lsize 0
> vfs.zfs.anon_size 29893120
> vfs.zfs.l2arc_norw 1
> vfs.zfs.l2arc_feed_again 1
> vfs.zfs.l2arc_noprefetch 1
> vfs.zfs.l2arc_feed_min_ms 200
> vfs.zfs.l2arc_feed_secs 1
> vfs.zfs.l2arc_headroom 2
> vfs.zfs.l2arc_write_boost 8388608
> vfs.zfs.l2arc_write_max 8388608
> vfs.zfs.arc_meta_limit 34359738368
> vfs.zfs.arc_meta_used 34250008792
> vfs.zfs.arc_min 17179869184
> vfs.zfs.arc_max 137438953472
> vfs.zfs.dedup.prefetch 1
> vfs.zfs.mdcomp_disable 0
> vfs.zfs.nopwrite_enabled 1
> vfs.zfs.zfetch.array_rd_sz 1048576
> vfs.zfs.zfetch.block_cap 256
> vfs.zfs.zfetch.min_sec_reap 2
> vfs.zfs.zfetch.max_streams 8
> vfs.zfs.prefetch_disable 0
> vfs.zfs.no_scrub_prefetch 0
> vfs.zfs.no_scrub_io 0
> vfs.zfs.resilver_min_time_ms 3000
> vfs.zfs.free_min_time_ms 1000
> vfs.zfs.scan_min_time_ms 1000
> vfs.zfs.scan_idle 50
> vfs.zfs.scrub_delay 4
> vfs.zfs.resilver_delay 2
> vfs.zfs.top_maxinflight 32
> vfs.zfs.write_to_degraded 0
> vfs.zfs.mg_noalloc_threshold 0
> vfs.zfs.condense_pct 200
> vfs.zfs.metaslab.weight_factor_enable 0
> vfs.zfs.metaslab.preload_enabled 1
> vfs.zfs.metaslab.preload_limit 3
> vfs.zfs.metaslab.unload_delay 8
> vfs.zfs.metaslab.load_pct 50
> vfs.zfs.metaslab.min_alloc_size 10485760
> vfs.zfs.metaslab.df_free_pct 4
> vfs.zfs.metaslab.df_alloc_threshold 131072
> vfs.zfs.metaslab.debug_unload 0
> vfs.zfs.metaslab.debug_load 0
> vfs.zfs.metaslab.gang_bang 131073
> vfs.zfs.check_hostid 1
> vfs.zfs.spa_asize_inflation 24
> vfs.zfs.deadman_enabled 1
> vfs.zfs.deadman_checktime_ms 5000
> vfs.zfs.deadman_synctime_ms 1000000
> vfs.zfs.recover 0
> vfs.zfs.txg.timeout 5
> vfs.zfs.min_auto_ashift 9
> vfs.zfs.max_auto_ashift 13
> vfs.zfs.vdev.cache.bshift 16
> vfs.zfs.vdev.cache.size 0
> vfs.zfs.vdev.cache.max 16384
> vfs.zfs.vdev.trim_on_init 1
> vfs.zfs.vdev.write_gap_limit 4096
> vfs.zfs.vdev.read_gap_limit 32768
> vfs.zfs.vdev.aggregation_limit 131072
> vfs.zfs.vdev.scrub_max_active 2
> vfs.zfs.vdev.scrub_min_active 1
> vfs.zfs.vdev.async_write_max_active 10
> vfs.zfs.vdev.async_write_min_active 1
> vfs.zfs.vdev.async_read_max_active 3
> vfs.zfs.vdev.async_read_min_active 1
> vfs.zfs.vdev.sync_write_max_active 10
> vfs.zfs.vdev.sync_write_min_active 10
> vfs.zfs.vdev.sync_read_max_active 10
> vfs.zfs.vdev.sync_read_min_active 10
> vfs.zfs.vdev.max_active 1000
> vfs.zfs.vdev.bio_delete_disable 0
> vfs.zfs.vdev.bio_flush_disable 0
> vfs.zfs.vdev.trim_max_pending 64
> vfs.zfs.vdev.trim_max_bytes 2147483648
> vfs.zfs.cache_flush_disable 0
> vfs.zfs.zil_replay_disable 0
> vfs.zfs.sync_pass_rewrite 2
> vfs.zfs.sync_pass_dont_compress 5
> vfs.zfs.sync_pass_deferred_free 2
> vfs.zfs.zio.use_uma 0
> vfs.zfs.snapshot_list_prefetch 0
> vfs.zfs.version.ioctl 3
> vfs.zfs.version.zpl 5
> vfs.zfs.version.spa 5000
> vfs.zfs.version.acl 1
> vfs.zfs.debug 0
> vfs.zfs.super_owner 0
> vfs.zfs.trim.enabled 1
> vfs.zfs.trim.max_interval 1
> vfs.zfs.trim.timeout 30
> vfs.zfs.trim.txg_delay 32
>
> ------------------------------------------------------------------------
>
> --
> Nathan Weeks
> USDA-ARS Corn Insects and Crop Genetics Research Unit
> Crop Genome Informatics Laboratory
> Iowa State University
> http://weeks.public.iastate.edu/
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>
More information about the freebsd-fs
mailing list