ZFS Optimizing for large directories and MANY files
mike tancsa
mike at sentex.net
Tue Jun 25 18:19:07 UTC 2019
I have been trying to understand various zfs sysctl settings once again
and how they might relate to optimizing a file server that has very few
big files, but MANY small ones (RELENG_12).... Sometimes directories
that get upwards of 30,000+ files and the odd time when some outside
user process breaks, 100,000+ files. Obviously, throwing a LOT of RAM
at the problem helps. But are there any more tunings I can do ? So far,
I have adjusted
vfs.zfs.arc_meta_strategy=1
/|vfs.zfs.arc_meta_limit to 65% of ARC memory over the default 25%
on the zfs set in question, I have set primarycache=metadata
|/
/|Anything else I can do to bias towards a file system with MANY files
? Unfortunately, I cant control the end users from dumping many files
in their single directories easily. I think the hit happens, when they
log in, do a dir, see what files they need to download, download and log
out. As long as that is cached, its not so bad.
|/
/|
|/
/|Doing some simple tests on an imported version of the data set (on
slower spinning rust drives), something simple such as
|/
/|# time find . -type f -mtime -2d|/
/|takes 40 min after a cold boot.|/
/|Watching zfs disk IO, its super slow in terms of bandwidth, but gstat
shows the disks close to being pegged. I guess the heads are thrashing
about inefficiently ?
|/
/|1{ryzenbsd12}# zpool iostat tmpdisk 1
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
tmpdisk 301G 1.46T 335 1 899K 33.0K
tmpdisk 301G 1.46T 402 0 1.02M 0
tmpdisk 301G 1.46T 265 0 559K 0
tmpdisk 301G 1.46T 331 0 715K 0
tmpdisk 301G 1.46T 276 0 650K 0
tmpdisk 301G 1.46T 293 0 718K 0
tmpdisk 301G 1.46T 432 0 1.11M 0
tmpdisk 301G 1.46T 435 0 1.03M 0
tmpdisk 301G 1.46T 412 0 1.01M 0
tmpdisk 301G 1.46T 315 0 717K 0
tmpdisk 301G 1.46T 417 0 1.04M 0
tmpdisk 301G 1.46T 457 0 1.13M 0
tmpdisk 301G 1.46T 448 0 1.05M 0|/
/|top shows ARC steadily growing
|/
/|ARC: 5119M Total, 2128M MFU, 2361M MRU, 1608K Anon, 73M Header, 560M Other
606M Compressed, 3902M Uncompressed, 6.43:1 Ratio
|/
/|
|/
/|stats show|/
/|
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 238
Recycle Misses: 0
Mutex Misses: 0
Evict Skips: 1.04k
ARC Size: 17.28% 5.20 GiB
Target Size: (Adaptive) 100.00% 30.07 GiB
Min Size (Hard Limit): 12.50% 3.76 GiB
Max Size (High Water): 8:1 30.07 GiB
ARC Size Breakdown:
Recently Used Cache Size: 50.00% 15.03 GiB
Frequently Used Cache Size: 50.00% 15.03 GiB
ARC Hash Breakdown:
Elements Max: 247.31k
Elements Current: 100.00% 247.31k
Collisions: 7.20k
Chain Max: 3
Chains: 6.99k
------------------------------------------------------------------------
ARC Efficiency: 2.53m
Cache Hit Ratio: 88.31% 2.23m
Cache Miss Ratio: 11.69% 295.48k
Actual Hit Ratio: 88.24% 2.23m
Data Demand Efficiency: 87.76% 20.01k
CACHE HITS BY CACHE LIST:
Anonymously Used: 0.08% 1.69k
Most Recently Used: 21.64% 483.05k
Most Frequently Used: 78.28% 1.75m
Most Recently Used Ghost: 0.00% 0
Most Frequently Used Ghost: 0.00% 0
CACHE HITS BY DATA TYPE:
Demand Data: 0.79% 17.56k
Prefetch Data: 0.00% 0
Demand Metadata: 99.14% 2.21m
Prefetch Metadata: 0.08% 1.69k
CACHE MISSES BY DATA TYPE:
Demand Data: 0.83% 2.45k
Prefetch Data: 0.00% 0
Demand Metadata: 18.79% 55.52k
Prefetch Metadata: 80.38% 237.51k|/
/|
|/
/|
|/
/|Once a single trip through the file system via find is done, top shows|/
/|ARC: 10G Total, 7161M MFU, 467M MRU, 1600K Anon, 191M Header, 2842M Other
1647M Compressed, 11G Uncompressed, 7.12:1 Ratio
|/
/|find, on the second iteration only takes|/
/|0{ryzenbsd12}# time find . -type f -mtime -2d
./list.txt
./l
1.992u 69.557s 1:11.54 100.0% 35+177k 169144+0io 0pf+0w
0{ryzenbsd12}#
|/
/|and the stats look appropriately better too|/
/|
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 238
Recycle Misses: 0
Mutex Misses: 0
Evict Skips: 1.04k
ARC Size: 34.11% 10.26 GiB
Target Size: (Adaptive) 100.00% 30.07 GiB
Min Size (Hard Limit): 12.50% 3.76 GiB
Max Size (High Water): 8:1 30.07 GiB
ARC Size Breakdown:
Recently Used Cache Size: 50.00% 15.03 GiB
Frequently Used Cache Size: 50.00% 15.03 GiB
ARC Hash Breakdown:
Elements Max: 688.43k
Elements Current: 100.00% 688.43k
Collisions: 53.65k
Chain Max: 4
Chains: 50.50k
------------------------------------------------------------------------
ARC Efficiency: 56.03m
Cache Hit Ratio: 98.07% 54.94m
Cache Miss Ratio: 1.93% 1.08m
Actual Hit Ratio: 97.64% 54.71m
Data Demand Efficiency: 86.21% 21.97k
CACHE HITS BY CACHE LIST:
Anonymously Used: 0.43% 237.54k
Most Recently Used: 12.19% 6.70m
Most Frequently Used: 87.37% 48.01m
Most Recently Used Ghost: 0.00% 0
Most Frequently Used Ghost: 0.00% 0
CACHE HITS BY DATA TYPE:
Demand Data: 0.03% 18.94k
Prefetch Data: 0.00% 0
Demand Metadata: 95.72% 52.59m
Prefetch Metadata: 4.24% 2.33m
CACHE MISSES BY DATA TYPE:
Demand Data: 0.28% 3.03k
Prefetch Data: 0.00% 0
Demand Metadata: 50.84% 550.75k
Prefetch Metadata: 48.88% 529.54k
------------------------------------------------------------------------
|/
/|Anything else to adjust ? I was going to use RAID1+0 for the dataset
on SSDs. Should I bother with an NVME drive for L2ARC caching ? On my
test box, I can sort of approximate how much RAM I need for metadata
(11G it seems), is there a better programatic way to find that value out ?
|/
/| ---Mike
|/
/|
|/
More information about the freebsd-questions
mailing list