ZFS, Vnode cache, and poor directory listing performance via Samba
Dave Baukus
daveb at spectralogic.com
Thu Mar 29 14:38:44 UTC 2018
Thank you for the explanation and suggestions Conrad.
Unfortunately this absurd directory is at a customer site generated by some ill-designed application.
Dave Baukus
On 03/28/2018 08:35 PM, Conrad Meyer wrote:
> Hi Dave,
>
> Full scans are the worst case for an LRU cache. In particular, you
> are full-scanning an *extremely* large directory, which evicts your
> entire vnode cache. Then you suffer the (presumably) entirely
> serialized penalty of refetching every single inode from disk again
> after the first scan.
>
> Here are some solutions in order of preference:
> 1. Organize your files better. 1 million in a single directory is
> absurd. Can windows explorer meaningfully navigate a 1mil file
> directory? I doubt it.
> 2. Continue to bump maxvnodes to compensate for poor file organization
> + naive clients doing full scans.
> 3. Enhance samba to signal something like DONTNEED on
> "SMB2_FIND_ID_BOTH_DIRECTORY_INFO Pattern: *" requests to the OS.
> 3.a. Enhance samba to parallelize or otherwise asynchronously process
> the above requests on huge directories (improve the uncached case).
>
> I don't think this has much to do with ZFS, other than that ZFS
> performance on your hardware appears to be quite bad without the VFS
> cache sitting in front to absorb most of the requests.
>
> Best,
> Conrad
>
>
> On Wed, Mar 28, 2018 at 7:07 PM, Dave Baukus <daveb at spectralogic.com> wrote:
>> Below is narrative angst and woe for which I have the the following observations/questions:
>>
>> - Increasing kern.maxvnodes from 600,000 to 2,000,000 apparently solves the "problem"
>> - This decreases the number of lookups in the scenario below from 40719 (some of which take over a second) to 4
>> - 2,000,00 may be extreme, but I was hoping for an authoritative comment on why/how this improves the scenario and
>> then perhaps I can come up with some reasonable tuning options.
>> - is this an artifact of the Freebsd 11-ish refactoring of the ZFS/Freebsd VNOP interface (?)
>>
>> -----------------------------------------------------
>> I have the following scenario on FreeBSD Stable 11.0:
>>
>> A ZFS with a directory containing 1,000,000 files; the root of this ZFS is
>> exported via SAMBA using NFSv4 ACL plugin and DOS attributes with the (<get|set>extattr) implementation.
>>
>> A local full listing of this directory (ls -l > /dev/null) completes in about 40 seconds.
>> A full listing from a Samba client (ls -l) completes in about 3 minutes.
>>
>> Using windows explorer from a Win2008 client is where the strangeness begins; it
>> takes between 8 to 12 minutes before control is returned to win-explorer.
>>
>> Tracing this with wireshark I noticed that "SMB2_FIND_ID_BOTH_DIRECTORY_INFO Pattern: *"
>> requests from the Win2008 client start off functioning well (client requests
>> 64k of data and samba responds with 64k of directory data). After about 150 seconds of this
>> interaction the client makes a "SMB2_FIND_ID_BOTH_DIRECTORY_INFO Pattern: *" request that is not
>> responded to for over 60 seconds. The windows client closes the connection, starts a new
>> connection, and begins directory listing from ground zero. This pattern continues for
>> 6 to 10 minutes; I never see final request/response where the server indicates that the
>> listing is complete; I believe win-explorer just gives up.
>>
>> Meanwhile, back on FreeBSD/ZFS I'm running a dtrace script that times the following
>> ZFS VNOPs for the connected Samba server instance:
>>
>> - fbt:zfs:zfs_*extattr:entry and return (get|set|delete|list)extattr
>> - fbt:zfs:zfs_freebsd_lookup:entry and return
>> - fbt:zfs:zfs_freebsd_readdir:entry and return
>> - fbt:zfs:zfs_freebsd_getattr:entry and return
>>
>> This starts off looking like:
>> 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 19931
>> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3975
>> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2662
>> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1711
>> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1768
>> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1411
>> 12 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 44325
>> 12 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 38054
>> 12 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 36137
>> ...
>> ... line 11,800
>> 16 27763 zfs_freebsd_getacl:return zfs_freebsd_getacl :: 2709
>> 16 27763 zfs_freebsd_getacl:return zfs_freebsd_getacl :: 2046
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2238
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1452
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1570
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1608
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1571
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1431
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1431
>> 16 27763 zfs_freebsd_getacl:return zfs_freebsd_getacl :: 2856
>> 16 27763 zfs_freebsd_getacl:return zfs_freebsd_getacl :: 1907
>> 16 27809 zfs_getextattr:return zfs_getextattr :: 3537
>> 16 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 45135
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2744
>> 16 27809 zfs_getextattr:return zfs_getextattr :: 3221
>> 16 27811 zfs_listextattr:return zfs_listextattr :: 3762
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2090
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2214
>> 16 27809 zfs_getextattr:return zfs_getextattr :: 20112
>> 16 27809 zfs_getextattr:return zfs_getextattr :: 14989
>> 16 27787 zfs_freebsd_readdir:return zfs_freebsd_readdir :: 35946
>> 16 27811 zfs_listextattr:return zfs_listextattr :: 46900
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2115
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1439
>> 16 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 22886
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1449
>> 16 27809 zfs_getextattr:return zfs_getextattr :: 4046
>> 16 27811 zfs_listextattr:return zfs_listextattr :: 2239
>> 16 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 15128
>> 16 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1640
>> ...
>> ... line 175,000
>> 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 85760734
>> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3617
>> 12 27809 zfs_getextattr:return zfs_getextattr :: 14064
>> 12 27811 zfs_listextattr:return zfs_listextattr :: 4088
>> 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 85586541
>> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2983
>> 12 27809 zfs_getextattr:return zfs_getextattr :: 11416
>> 12 27811 zfs_listextattr:return zfs_listextattr :: 3230
>> 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 85758027
>> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3124
>> ...
>> ... line 176,0000
>> 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1113397903
>> 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3189
>> 1 27809 zfs_getextattr:return zfs_getextattr :: 6423
>> 1 27811 zfs_listextattr:return zfs_listextattr :: 3090
>> 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1108181740
>> 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3267
>> 1 27809 zfs_getextattr:return zfs_getextattr :: 5486
>> 1 27811 zfs_listextattr:return zfs_listextattr :: 3111
>> 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1092061756
>> 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3113
>> 1 27809 zfs_getextattr:return zfs_getextattr :: 5691
>> 1 27811 zfs_listextattr:return zfs_listextattr :: 3073
>> 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1102236755
>> 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3435
>> 1 27809 zfs_getextattr:return zfs_getextattr :: 5862
>> 1 27811 zfs_listextattr:return zfs_listextattr :: 3771
>> 1 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1101668231
>> 1 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3189
>> 1 27809 zfs_getextattr:return zfs_getextattr :: 6671
>> 15 27811 zfs_listextattr:return zfs_listextattr :: 12951
>> 15 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1061648117
>> 15 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 5365
>> 15 27809 zfs_getextattr:return zfs_getextattr :: 5731
>> 21 27811 zfs_listextattr:return zfs_listextattr :: 8178
>> 21 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64429430
>> 21 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2912
>> 21 27809 zfs_getextattr:return zfs_getextattr :: 5566
>> 21 27811 zfs_listextattr:return zfs_listextattr :: 2454
>> 19 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1017176234
>> 19 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2976
>> 19 27809 zfs_getextattr:return zfs_getextattr :: 6230
>> 19 27811 zfs_listextattr:return zfs_listextattr :: 2710
>> 19 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64211015
>> 19 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1876
>> 19 27809 zfs_getextattr:return zfs_getextattr :: 3690
>> 19 27811 zfs_listextattr:return zfs_listextattr :: 2292
>> 19 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 17007
>> 19 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1766
>> 19 27809 zfs_getextattr:return zfs_getextattr :: 3357
>> 19 27811 zfs_listextattr:return zfs_listextattr :: 2331
>> 19 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 63817436
>> 19 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1827
>> 19 27809 zfs_getextattr:return zfs_getextattr :: 12231
>> 12 27811 zfs_listextattr:return zfs_listextattr :: 8658
>> 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64859702
>> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3296
>> 12 27809 zfs_getextattr:return zfs_getextattr :: 6118
>> 12 27811 zfs_listextattr:return zfs_listextattr :: 2454
>> 12 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 17442
>> 12 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1676
>> 12 27809 zfs_getextattr:return zfs_getextattr :: 3649
>> 12 27811 zfs_listextattr:return zfs_listextattr :: 2363
>> 0 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1013471141
>> 0 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 5995
>> 0 27809 zfs_getextattr:return zfs_getextattr :: 9280
>> 0 27811 zfs_listextattr:return zfs_listextattr :: 3219
>> 0 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64286196
>> 0 27765 zfs_freebsd_getattr:return zfs_freebsd_getattr :: 5618
>> 0 27809 zfs_getextattr:return zfs_getextattr :: 8919
>> 0 27811 zfs_listextattr:return zfs_listextattr :: 3117
>> 13 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 999431953
>> 13 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1062322808
>> 9 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1061885578
>> 9 27777 zfs_freebsd_lookup:return zfs_freebsd_lookup :: 11283
>>
>> At this point the client closes the connection and the connected, samba server process exits.
>>
>> After increasing the vnodes to 2M, the wire transfer of the directoy listing completes
>> in about 60 seconds with the final "no more files" response status observed,
>> and win-explorer cogitates on the data for about another 2 minutes
>> before control is returned to win-explorer.
>>
>> Thanks for any feed back.
>>
>> --
>> Dave Baukus
>> _______________________________________________
>> freebsd-fs at freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> .
>
More information about the freebsd-fs
mailing list