ZFS read performance disparity between clone and parent

Nathan Weeks weeks at iastate.edu
Wed May 13 18:54:23 UTC 2015


While troubleshooting performance disparities between development and
production jails hosting PostgreSQL instances, I noticed (with the help of
dtruss) that the 8k read() performance in the production jail was an order of
magnitude worse than the read() performance in the development jail. As the
ZFS file system hosting the production jail was cloned from a snapshot of the
development jail, and had not been modified, this didn't make sense to me.

Using "dd" command with an 8k block size to emulate the PostgreSQL read()
size, I observed a large performance difference between reading one of the
large (1G) underlying postgres database files in the development jail's file
system vs. the corresponding file in the cloned file system:

# dd if=/jails/dev/usr/local/pgsql/data/base/16399/16436 of=/dev/null bs=8192
131072+0 records in
131072+0 records out
1073741824 bytes transferred in 4.198993 secs (255714128 bytes/sec)
# dd if=/jails/prod/usr/local/pgsql/data/base/16399/16436 of=/dev/null bs=8192
131072+0 records in
131072+0 records out
1073741824 bytes transferred in 17.314135 secs (62015331 bytes/sec)
# ls -l /jails/dev/usr/local/pgsql/data/base/16399/16436
/jails/prod/usr/local/pgsql/data/base/16399/16436
-rw------- 1 70 70 1073741824 Feb 5 16:41
/jails/dev/usr/local/pgsql/data/base/16399/16436
-rw------- 1 70 70 1073741824 Feb 5 16:41
/jails/prod/usr/local/pgsql/data/base/16399/16436

I repeated this exercise several times to verify the read performance
difference. Interestingly, prefixing the "dd" command with "/usr/bin/time -l"
revealed that in both cases, "block input operations" was 0, apparently
indicating that both files were being read from cache. In neither case did
"zpool iostat 1" show significant I/O being performed during the execution of
the "dd" command.

Has anyone else encountered a similar issue, and know of an
explanation/solution/better workaround? I had previously assumed that there
would be no performance difference between reading a file on a ZFS file system
and the corresponding file on a cloned file system when none of the data
blocks have changed (this is FreeBSD 9.3, so the "Single Copy ARC" feature
should apply). Dedup isn't being used on any file system.

The output of zfs-stats follows; I can provide any additional info that might
be of use in identifying the cause of this issue.

------------------------------------------------------------------------
ZFS Subsystem Report                            Wed May 13 12:22:00 2015
------------------------------------------------------------------------

System Information:

        Kernel Version:                         903000 (osreldate)
        Hardware Platform:                      amd64
        Processor Architecture:                 amd64

        ZFS Storage pool Version:               5000
        ZFS Filesystem Version:                 5

FreeBSD 9.3-RELEASE-p5 #0: Mon Nov 3 22:38:58 UTC 2014 root
12:22PM  up 166 days,  3:36, 7 users, load averages: 2.34, 2.31, 2.17

------------------------------------------------------------------------

System Memory:

        8.83%   21.95   GiB Active,     1.67%   4.14    GiB Inact
        68.99%  171.40  GiB Wired,      0.40%   1.00    GiB Cache
        20.10%  49.93   GiB Free,       0.01%   16.12   MiB Gap

        Real Installed:                         256.00  GiB
        Real Available:                 99.99%  255.97  GiB
        Real Managed:                   97.06%  248.43  GiB

        Logical Total:                          256.00  GiB
        Logical Used:                   78.49%  200.92  GiB
        Logical Free:                   21.51%  55.08   GiB

Kernel Memory:                                  117.28  GiB
        Data:                           99.98%  117.25  GiB
        Text:                           0.02%   26.07   MiB

Kernel Memory Map:                              241.10  GiB
        Size:                           43.83%  105.67  GiB
        Free:                           56.17%  135.43  GiB

------------------------------------------------------------------------

ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                143.56m
        Recycle Misses:                         275.73m
        Mutex Misses:                           1.50m
        Evict Skips:                            20.24b

ARC Size:                               99.77%  127.71  GiB
        Target Size: (Adaptive)         100.00% 128.00  GiB
        Min Size (Hard Limit):          12.50%  16.00   GiB
        Max Size (High Water):          8:1     128.00  GiB

ARC Size Breakdown:
        Recently Used Cache Size:       68.86%  88.15   GiB
        Frequently Used Cache Size:     31.14%  39.85   GiB

ARC Hash Breakdown:
        Elements Max:                           27.87m
        Elements Current:               40.13%  11.18m
        Collisions:                             1.95b
        Chain Max:                              26
        Chains:                                 2.44m

------------------------------------------------------------------------

ARC Efficiency:                                 88.77b
        Cache Hit Ratio:                99.52%  88.34b
        Cache Miss Ratio:               0.48%   426.00m
        Actual Hit Ratio:               98.86%  87.76b

        Data Demand Efficiency:         99.99%  58.75b
        Data Prefetch Efficiency:       98.47%  1.08b

        CACHE HITS BY CACHE LIST:
          Anonymously Used:             0.21%   187.51m
          Most Recently Used:           1.93%   1.71b
          Most Frequently Used:         97.41%  86.05b
          Most Recently Used Ghost:     0.04%   39.14m
          Most Frequently Used Ghost:   0.41%   358.78m

        CACHE HITS BY DATA TYPE:
          Demand Data:                  66.49%  58.74b
          Prefetch Data:                1.21%   1.07b
          Demand Metadata:              31.74%  28.04b
          Prefetch Metadata:            0.56%   491.01m

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  1.70%   7.26m
          Prefetch Data:                3.89%   16.56m
          Demand Metadata:              83.84%  357.15m
          Prefetch Metadata:            10.57%  45.03m

------------------------------------------------------------------------

L2ARC is disabled

------------------------------------------------------------------------

File-Level Prefetch: (HEALTHY)

DMU Efficiency:                                 187.26b
        Hit Ratio:                      82.21%  153.94b
        Miss Ratio:                     17.79%  33.32b

        Colinear:                               33.32b
          Hit Ratio:                    0.01%   3.35m
          Miss Ratio:                   99.99%  33.32b

        Stride:                                 150.63b
          Hit Ratio:                    100.00% 150.63b
          Miss Ratio:                   0.00%   453.04k

DMU Misc:
        Reclaim:                                33.32b
          Successes:                    0.36%   118.64m
          Failures:                     99.64%  33.20b

        Streams:                                3.31b
          +Resets:                      0.00%   20.36k
          -Resets:                      100.00% 3.31b
          Bogus:                                0

------------------------------------------------------------------------

VDEV cache is disabled

------------------------------------------------------------------------

ZFS Tunables (sysctl):
        kern.maxusers                           16718
        vm.kmem_size                            266754412544
        vm.kmem_size_scale                      1
        vm.kmem_size_min                        0
        vm.kmem_size_max                        329853485875
        vfs.zfs.l2c_only_size                   0
        vfs.zfs.mfu_ghost_data_lsize            63695688192
        vfs.zfs.mfu_ghost_metadata_lsize        8300248064
        vfs.zfs.mfu_ghost_size                  71995936256
        vfs.zfs.mfu_data_lsize                  34951425024
        vfs.zfs.mfu_metadata_lsize              4976638976
        vfs.zfs.mfu_size                        41843978240
        vfs.zfs.mru_ghost_data_lsize            41844330496
        vfs.zfs.mru_ghost_metadata_lsize        23598693888
        vfs.zfs.mru_ghost_size                  65443024384
        vfs.zfs.mru_data_lsize                  67918019072
        vfs.zfs.mru_metadata_lsize              411918848
        vfs.zfs.mru_size                        71823354880
        vfs.zfs.anon_data_lsize                 0
        vfs.zfs.anon_metadata_lsize             0
        vfs.zfs.anon_size                       29893120
        vfs.zfs.l2arc_norw                      1
        vfs.zfs.l2arc_feed_again                1
        vfs.zfs.l2arc_noprefetch                1
        vfs.zfs.l2arc_feed_min_ms               200
        vfs.zfs.l2arc_feed_secs                 1
        vfs.zfs.l2arc_headroom                  2
        vfs.zfs.l2arc_write_boost               8388608
        vfs.zfs.l2arc_write_max                 8388608
        vfs.zfs.arc_meta_limit                  34359738368
        vfs.zfs.arc_meta_used                   34250008792
        vfs.zfs.arc_min                         17179869184
        vfs.zfs.arc_max                         137438953472
        vfs.zfs.dedup.prefetch                  1
        vfs.zfs.mdcomp_disable                  0
        vfs.zfs.nopwrite_enabled                1
        vfs.zfs.zfetch.array_rd_sz              1048576
        vfs.zfs.zfetch.block_cap                256
        vfs.zfs.zfetch.min_sec_reap             2
        vfs.zfs.zfetch.max_streams              8
        vfs.zfs.prefetch_disable                0
        vfs.zfs.no_scrub_prefetch               0
        vfs.zfs.no_scrub_io                     0
        vfs.zfs.resilver_min_time_ms            3000
        vfs.zfs.free_min_time_ms                1000
        vfs.zfs.scan_min_time_ms                1000
        vfs.zfs.scan_idle                       50
        vfs.zfs.scrub_delay                     4
        vfs.zfs.resilver_delay                  2
        vfs.zfs.top_maxinflight                 32
        vfs.zfs.write_to_degraded               0
        vfs.zfs.mg_noalloc_threshold            0
        vfs.zfs.condense_pct                    200
        vfs.zfs.metaslab.weight_factor_enable   0
        vfs.zfs.metaslab.preload_enabled        1
        vfs.zfs.metaslab.preload_limit          3
        vfs.zfs.metaslab.unload_delay           8
        vfs.zfs.metaslab.load_pct               50
        vfs.zfs.metaslab.min_alloc_size         10485760
        vfs.zfs.metaslab.df_free_pct            4
        vfs.zfs.metaslab.df_alloc_threshold     131072
        vfs.zfs.metaslab.debug_unload           0
        vfs.zfs.metaslab.debug_load             0
        vfs.zfs.metaslab.gang_bang              131073
        vfs.zfs.check_hostid                    1
        vfs.zfs.spa_asize_inflation             24
        vfs.zfs.deadman_enabled                 1
        vfs.zfs.deadman_checktime_ms            5000
        vfs.zfs.deadman_synctime_ms             1000000
        vfs.zfs.recover                         0
        vfs.zfs.txg.timeout                     5
        vfs.zfs.min_auto_ashift                 9
        vfs.zfs.max_auto_ashift                 13
        vfs.zfs.vdev.cache.bshift               16
        vfs.zfs.vdev.cache.size                 0
        vfs.zfs.vdev.cache.max                  16384
        vfs.zfs.vdev.trim_on_init               1
        vfs.zfs.vdev.write_gap_limit            4096
        vfs.zfs.vdev.read_gap_limit             32768
        vfs.zfs.vdev.aggregation_limit          131072
        vfs.zfs.vdev.scrub_max_active           2
        vfs.zfs.vdev.scrub_min_active           1
        vfs.zfs.vdev.async_write_max_active     10
        vfs.zfs.vdev.async_write_min_active     1
        vfs.zfs.vdev.async_read_max_active      3
        vfs.zfs.vdev.async_read_min_active      1
        vfs.zfs.vdev.sync_write_max_active      10
        vfs.zfs.vdev.sync_write_min_active      10
        vfs.zfs.vdev.sync_read_max_active       10
        vfs.zfs.vdev.sync_read_min_active       10
        vfs.zfs.vdev.max_active                 1000
        vfs.zfs.vdev.bio_delete_disable         0
        vfs.zfs.vdev.bio_flush_disable          0
        vfs.zfs.vdev.trim_max_pending           64
        vfs.zfs.vdev.trim_max_bytes             2147483648
        vfs.zfs.cache_flush_disable             0
        vfs.zfs.zil_replay_disable              0
        vfs.zfs.sync_pass_rewrite               2
        vfs.zfs.sync_pass_dont_compress         5
        vfs.zfs.sync_pass_deferred_free         2
        vfs.zfs.zio.use_uma                     0
        vfs.zfs.snapshot_list_prefetch          0
        vfs.zfs.version.ioctl                   3
        vfs.zfs.version.zpl                     5
        vfs.zfs.version.spa                     5000
        vfs.zfs.version.acl                     1
        vfs.zfs.debug                           0
        vfs.zfs.super_owner                     0
        vfs.zfs.trim.enabled                    1
        vfs.zfs.trim.max_interval               1
        vfs.zfs.trim.timeout                    30
        vfs.zfs.trim.txg_delay                  32

------------------------------------------------------------------------

--
Nathan Weeks
USDA-ARS Corn Insects and Crop Genetics Research Unit
Crop Genome Informatics Laboratory
Iowa State University
http://weeks.public.iastate.edu/


More information about the freebsd-fs mailing list