ZFS and large directories - caveat report

Ivan Voras ivoras at freebsd.org
Thu Jul 21 19:29:49 UTC 2011


On 21 July 2011 20:15, Artem Belevich <art at freebsd.org> wrote:
> On Thu, Jul 21, 2011 at 9:38 AM, Ivan Voras <ivoras at freebsd.org> wrote:
>> On 21 July 2011 17:50, Freddie Cash <fjwcash at gmail.com> wrote:
>>> On Thu, Jul 21, 2011 at 8:45 AM, Ivan Voras <ivoras at freebsd.org> wrote:
>>>>
>>>> Is there an equivalent of UFS dirhash memory setting for ZFS? (i.e. the
>>>> size of the metadata cache)
>>>
>>> vfs.zfs.arc_meta_limit
>>>
>>> This sets the amount of ARC that can be used for metadata.  The default is
>>> 1/8th of ARC, I believe.  This setting lets you use "primarycache=all"
>>> (store metadata and file data in ARC) but then tune how much is used for
>>> each.
>>>
>>> Not sure if that will help in your case or not, but it's a sysctl you can
>>> play with.
>>
>> I don't think that it works, or at least is not as efficient as dirhash:
>>
>> www:~> sysctl -a | grep meta
>> kern.metadelay: 28
>> vfs.zfs.mfu_ghost_metadata_lsize: 129082368
>> vfs.zfs.mfu_metadata_lsize: 116224
>> vfs.zfs.mru_ghost_metadata_lsize: 113958912
>> vfs.zfs.mru_metadata_lsize: 16384
>> vfs.zfs.anon_metadata_lsize: 0
>> vfs.zfs.arc_meta_limit: 322412800
>> vfs.zfs.arc_meta_used: 506907792
>> kstat.zfs.misc.arcstats.demand_metadata_hits: 4471705
>> kstat.zfs.misc.arcstats.demand_metadata_misses: 2110328
>> kstat.zfs.misc.arcstats.prefetch_metadata_hits: 27
>> kstat.zfs.misc.arcstats.prefetch_metadata_misses: 51
>>
>> arc_meta_used is nearly 500 MB which should be enough even in this
>> case. With filenames of 32 characters, all the filenames alone for
>> 130,000 files in a directory take about 4 MB - I doubt the ZFS
>> introduces so much extra metadata it doesn't fit in 500 MB.
>
> For what it's worth, 500K files in one directory seems to work
> reasonably well on my box running few weeks old 8-stable (quad core
> 8GB RAM, ~6GB ARC), ZFSv28 pool on a 2-drive mirror + 50GB L2ARC.
>
> $ time perl -e 'use Fcntl; for $f  (1..500000)
> {sysopen(FH,"f$f",O_CREAT); close(FH);}'
> perl -e  >| /dev/null  2.26s user 39.17s system 96% cpu 43.156 total
>
> $ time find . |wc -l
>  500001
> find .  0.16s user 0.33s system 99% cpu 0.494 total
>
> $ time find . -ls |wc -l
>  500001
> find . -ls  1.93s user 12.13s system 96% cpu 14.643 total
>
> time find . |xargs -n 100 rm
> find .  0.22s user 0.28s system 0% cpu 2:45.12 total
> xargs -n 100 rm  1.25s user 58.51s system 36% cpu 2:45.61 total
>
> Deleting files resulted in a constant stream of writes to hard drives.
> I guess file deletion may end up up being a synchronous write
> committed to ZIL right away. If that's indeed the case, small slog on
> SSD could probably speed up file deletion a bit.

That's a very interesting find.

Or maybe the issue is fragmentation: could you modify the script
slightly to create files in about 50 directories in parallel (i.e.
create in dir1, create in dir2, create in dir3... create in dir 50,
then again create in dir1, create in dir2...)?

Could you for the sake of curiosity upgrate this system to the latest
8-stable and retry it?


More information about the freebsd-fs mailing list