ZFS, Vnode cache, and poor directory listing performance via Samba

Conrad Meyer cem at freebsd.org
Thu Mar 29 03:25:25 UTC 2018


Hi Dave,

Full scans are the worst case for an LRU cache.  In particular, you
are full-scanning an *extremely* large directory, which evicts your
entire vnode cache.  Then you suffer the (presumably) entirely
serialized penalty of refetching every single inode from disk again
after the first scan.

Here are some solutions in order of preference:
1. Organize your files better.  1 million in a single directory is
absurd.  Can windows explorer meaningfully navigate a 1mil file
directory?  I doubt it.
2. Continue to bump maxvnodes to compensate for poor file organization
+ naive clients doing full scans.
3. Enhance samba to signal something like DONTNEED on
"SMB2_FIND_ID_BOTH_DIRECTORY_INFO Pattern: *" requests to the OS.
3.a. Enhance samba to parallelize or otherwise asynchronously process
the above requests on huge directories (improve the uncached case).

I don't think this has much to do with ZFS, other than that ZFS
performance on your hardware appears to be quite bad without the VFS
cache sitting in front to absorb most of the requests.

Best,
Conrad


On Wed, Mar 28, 2018 at 7:07 PM, Dave Baukus <daveb at spectralogic.com> wrote:
> Below is narrative angst and woe for which I have the the following observations/questions:
>
> - Increasing kern.maxvnodes from 600,000 to 2,000,000 apparently solves the "problem"
> - This decreases the number of lookups in the scenario below from 40719 (some of which take over a second) to 4
> - 2,000,00 may be extreme, but I was hoping for an authoritative comment on why/how this improves the scenario and
>    then perhaps I can come up with some reasonable tuning options.
> - is this an artifact of the Freebsd 11-ish refactoring of the ZFS/Freebsd VNOP interface (?)
>
> -----------------------------------------------------
> I have the following scenario on FreeBSD Stable 11.0:
>
> A ZFS with a directory containing 1,000,000 files; the root of this ZFS is
> exported via SAMBA using NFSv4 ACL plugin and DOS attributes with the (<get|set>extattr) implementation.
>
> A local full listing of this directory (ls -l > /dev/null) completes in about 40 seconds.
> A full listing from a Samba client (ls -l) completes in about 3 minutes.
>
> Using windows explorer from a Win2008 client is where the strangeness begins; it
> takes between 8 to 12 minutes before control is returned to win-explorer.
>
> Tracing this with wireshark I noticed that "SMB2_FIND_ID_BOTH_DIRECTORY_INFO Pattern: *"
> requests from the Win2008 client start off functioning well (client requests
> 64k of data and samba responds with 64k of directory data). After about 150 seconds of this
> interaction the client makes a "SMB2_FIND_ID_BOTH_DIRECTORY_INFO Pattern: *" request that is not
> responded to for over 60 seconds. The windows client closes the connection, starts a new
> connection, and begins directory listing from ground zero. This pattern continues for
> 6 to 10 minutes; I never see final request/response where the server indicates that the
> listing is complete; I believe win-explorer just gives up.
>
> Meanwhile, back on FreeBSD/ZFS I'm running a dtrace script that times the following
> ZFS VNOPs for the connected Samba server instance:
>
> - fbt:zfs:zfs_*extattr:entry and return (get|set|delete|list)extattr
> - fbt:zfs:zfs_freebsd_lookup:entry and return
> - fbt:zfs:zfs_freebsd_readdir:entry and return
> - fbt:zfs:zfs_freebsd_getattr:entry and return
>
> This starts off looking like:
>   12  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 19931
>   12  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3975
>   12  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2662
>   12  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1711
>   12  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1768
>   12  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1411
>   12  27787       zfs_freebsd_readdir:return zfs_freebsd_readdir :: 44325
>   12  27787       zfs_freebsd_readdir:return zfs_freebsd_readdir :: 38054
>   12  27787       zfs_freebsd_readdir:return zfs_freebsd_readdir :: 36137
> ...
> ... line 11,800
>   16  27763        zfs_freebsd_getacl:return zfs_freebsd_getacl :: 2709
>   16  27763        zfs_freebsd_getacl:return zfs_freebsd_getacl :: 2046
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2238
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1452
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1570
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1608
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1571
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1431
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1431
>   16  27763        zfs_freebsd_getacl:return zfs_freebsd_getacl :: 2856
>   16  27763        zfs_freebsd_getacl:return zfs_freebsd_getacl :: 1907
>   16  27809            zfs_getextattr:return zfs_getextattr :: 3537
>   16  27787       zfs_freebsd_readdir:return zfs_freebsd_readdir :: 45135
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2744
>   16  27809            zfs_getextattr:return zfs_getextattr :: 3221
>   16  27811           zfs_listextattr:return zfs_listextattr :: 3762
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2090
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2214
>   16  27809            zfs_getextattr:return zfs_getextattr :: 20112
>   16  27809            zfs_getextattr:return zfs_getextattr :: 14989
>   16  27787       zfs_freebsd_readdir:return zfs_freebsd_readdir :: 35946
>   16  27811           zfs_listextattr:return zfs_listextattr :: 46900
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2115
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1439
>   16  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 22886
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1449
>   16  27809            zfs_getextattr:return zfs_getextattr :: 4046
>   16  27811           zfs_listextattr:return zfs_listextattr :: 2239
>   16  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 15128
>   16  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1640
> ...
> ... line 175,000
>   12  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 85760734
>   12  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3617
>   12  27809            zfs_getextattr:return zfs_getextattr :: 14064
>   12  27811           zfs_listextattr:return zfs_listextattr :: 4088
>   12  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 85586541
>   12  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2983
>   12  27809            zfs_getextattr:return zfs_getextattr :: 11416
>   12  27811           zfs_listextattr:return zfs_listextattr :: 3230
>   12  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 85758027
>   12  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3124
> ...
> ... line 176,0000
>    1  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1113397903
>    1  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3189
>    1  27809            zfs_getextattr:return zfs_getextattr :: 6423
>    1  27811           zfs_listextattr:return zfs_listextattr :: 3090
>    1  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1108181740
>    1  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3267
>    1  27809            zfs_getextattr:return zfs_getextattr :: 5486
>    1  27811           zfs_listextattr:return zfs_listextattr :: 3111
>    1  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1092061756
>    1  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3113
>    1  27809            zfs_getextattr:return zfs_getextattr :: 5691
>    1  27811           zfs_listextattr:return zfs_listextattr :: 3073
>    1  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1102236755
>    1  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3435
>    1  27809            zfs_getextattr:return zfs_getextattr :: 5862
>    1  27811           zfs_listextattr:return zfs_listextattr :: 3771
>    1  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1101668231
>    1  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3189
>    1  27809            zfs_getextattr:return zfs_getextattr :: 6671
>   15  27811           zfs_listextattr:return zfs_listextattr :: 12951
>   15  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1061648117
>   15  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 5365
>   15  27809            zfs_getextattr:return zfs_getextattr :: 5731
>   21  27811           zfs_listextattr:return zfs_listextattr :: 8178
>   21  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64429430
>   21  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2912
>   21  27809            zfs_getextattr:return zfs_getextattr :: 5566
>   21  27811           zfs_listextattr:return zfs_listextattr :: 2454
>   19  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1017176234
>   19  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 2976
>   19  27809            zfs_getextattr:return zfs_getextattr :: 6230
>   19  27811           zfs_listextattr:return zfs_listextattr :: 2710
>   19  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64211015
>   19  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1876
>   19  27809            zfs_getextattr:return zfs_getextattr :: 3690
>   19  27811           zfs_listextattr:return zfs_listextattr :: 2292
>   19  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 17007
>   19  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1766
>   19  27809            zfs_getextattr:return zfs_getextattr :: 3357
>   19  27811           zfs_listextattr:return zfs_listextattr :: 2331
>   19  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 63817436
>   19  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1827
>   19  27809            zfs_getextattr:return zfs_getextattr :: 12231
>   12  27811           zfs_listextattr:return zfs_listextattr :: 8658
>   12  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64859702
>   12  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 3296
>   12  27809            zfs_getextattr:return zfs_getextattr :: 6118
>   12  27811           zfs_listextattr:return zfs_listextattr :: 2454
>   12  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 17442
>   12  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 1676
>   12  27809            zfs_getextattr:return zfs_getextattr :: 3649
>   12  27811           zfs_listextattr:return zfs_listextattr :: 2363
>    0  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1013471141
>    0  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 5995
>    0  27809            zfs_getextattr:return zfs_getextattr :: 9280
>    0  27811           zfs_listextattr:return zfs_listextattr :: 3219
>    0  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 64286196
>    0  27765       zfs_freebsd_getattr:return zfs_freebsd_getattr :: 5618
>    0  27809            zfs_getextattr:return zfs_getextattr :: 8919
>    0  27811           zfs_listextattr:return zfs_listextattr :: 3117
>   13  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 999431953
>   13  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1062322808
>    9  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 1061885578
>    9  27777        zfs_freebsd_lookup:return zfs_freebsd_lookup :: 11283
>
> At this point the client closes the connection and the connected, samba server process exits.
>
> After increasing the vnodes to 2M, the wire transfer of the directoy listing completes
> in about 60 seconds with the final "no more files" response status observed,
> and win-explorer cogitates on the data for about another 2 minutes
> before control is returned to win-explorer.
>
> Thanks for any feed back.
>
> --
> Dave Baukus
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"


More information about the freebsd-fs mailing list