Improving ZFS performance for large directories

Kevin Day toasty at dragondata.com
Wed Feb 20 16:07:16 UTC 2013


On Feb 20, 2013, at 2:28 AM, Peter Jeremy <peter at rulingia.com> wrote:
>> Thinking I'd make the primary cache metadata only, and the secondary
>> cache "all" would improve things,
> 
> This won't work as expected.  L2ARC only caches data coming out of ARC
> so by setting ARC to cache metadata only, there's never any "data" in
> ARC and hence never any evicted from ARC to L2ARC.
> 

That makes sense, I wasn't sure if it was smart enough to realize this happening or not, but I guess it won't work.


>> I wiped the device (SATA secure erase to make sure)
> 
> That's not necessary.  L2ARC doesn't survive reboots because all teh
> L2ARC "metadata" is in ARC only.  This does mean that it takes quite
> a while for L2ARC to warm up following a reboot.
> 

I was more concerned with the SSD's performance than ZFS caring what was there. A few cases completely filled the SSD, which can slow things down (there are no free blocks for it to use). Secure Erase will reset it so the drive's controller knows EVERYTHING is really free. We have one model of SSD here that will drop to about 5% of it's original performance after every block on the drive has been written to once. We're not using that model anymore, but I still like to be sure. :)

>> There are roughly 29M files, growing at about 50k files/day. We
>> recently upgraded, and are now at 96 3TB drives in the pool. 
> 
> That number of files isn't really excessive but it sounds like your
> workload has very low locality.  At this stage, my suggestions are:
> 1) Disable atime if you don't need it & haven't already.
>   Otherwise file accesses are triggering metadata updates.
> 2) Increase vfs.zfs.arc_meta_limit
>   You're still getting more metadata misses than data misses
> 3) Increase your ARC size (more RAM)
>   Your pool is quite large compared to your RAM.
> 

Yeah, I think the locality is basically zero. It's multiple rsyncs running across the entire filesystem repeatedly. Each directory is only going to be touched once per pass through, so that isn't really going to benefit much from cache unless we get lucky and two rsyncs come in back-to-back where one is chasing another.

Atime is already off globally - nothing we use needs it. We are at the limit for RAM for this motherboard, so any further increases are going to be quite expensive. 

> 
>> Is there any way to tell why more metadata isn't
>> being pushed to the L2ARC?
> 
> ZFS treats writing to L2ARC very much as an afterthought.  L2ARC writes
> are rate limited by vfs.zfs.l2arc_write_{boost,max} and will be aborted
> if they might interfere with a read.  I'm not sure how to improve it.
> 

At this stage there are just zero writes being done, so perhaps the problem is that with so much pressure on the arc metadata, nothing is getting a chance to get pushed into the L2ARC. I'm going to try to increase the meta limit on ARC, but there's not a great deal more I can do.

> Since this is all generic ZFS, you might like to try asking on
> zfs at lists.illumos.org as well.  Some of the experts there might have
> some ideas.

I will try that, thanks!

-- Kevin



More information about the freebsd-fs mailing list