Re: nullfs and ZFS issues
- Reply: Mateusz Guzik : "Re: nullfs and ZFS issues"
- In reply to: Mateusz Guzik : "Re: nullfs and ZFS issues"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 19 Apr 2022 09:15:42 UTC
On 4/19/22, Mateusz Guzik <mjguzik@gmail.com> wrote:
> On 4/19/22, Doug Ambrisko <ambrisko@ambrisko.com> wrote:
>> I've switched my laptop to use nullfs and ZFS. Previously, I used
>> localhost NFS mounts instead of nullfs when nullfs would complain
>> that it couldn't mount. Since that check has been removed, I've
>> switched to nullfs only. However, every so often my laptop would
>> get slow and the the ARC evict and prune thread would consume two
>> cores 100% until I rebooted. I had a 1G max. ARC and have increased
>> it to 2G now. Looking into this has uncovered some issues:
>> - nullfs would prevent vnlru_free_vfsops from doing anything
>> when called from ZFS arc_prune_task
>> - nullfs would hang onto a bunch of vnodes unless mounted with
>> nocache
>> - nullfs and nocache would break untar. This has been fixed now.
>>
>> With nullfs, nocache and settings max vnodes to a low number I can
>> keep the ARC around the max. without evict and prune consuming
>> 100% of 2 cores. This doesn't seem like the best solution but it
>> better then when the ARC starts spinning.
>>
>> Looking into this issue with bhyve and a md drive for testing I create
>> a brand new zpool mounted as /test and then nullfs mount /test to /mnt.
>> I loop through untaring the Linux kernel into the nullfs mount, rm -rf it
>> and repeat. I set the ARC to the smallest value I can. Untarring the
>> Linux kernel was enough to get the ARC evict and prune to spin since
>> they couldn't evict/prune anything.
>>
>> Looking at vnlru_free_vfsops called from ZFS arc_prune_task I see it
>> static int
>> vnlru_free_impl(int count, struct vfsops *mnt_op, struct vnode *mvp)
>> {
>> ...
>>
>> for (;;) {
>> ...
>> vp = TAILQ_NEXT(vp, v_vnodelist);
>> ...
>>
>> /*
>> * Don't recycle if our vnode is from different type
>> * of mount point. Note that mp is type-safe, the
>> * check does not reach unmapped address even if
>> * vnode is reclaimed.
>> */
>> if (mnt_op != NULL && (mp = vp->v_mount) != NULL &&
>> mp->mnt_op != mnt_op) {
>> continue;
>> }
>> ...
>>
>> The vp ends up being the nulfs mount and then hits the continue
>> even though the passed in mvp is on ZFS. If I do a hack to
>> comment out the continue then I see the ARC, nullfs vnodes and
>> ZFS vnodes grow. When the ARC calls arc_prune_task that calls
>> vnlru_free_vfsops and now the vnodes go down for nullfs and ZFS.
>> The ARC cache usage also goes down. Then they increase again until
>> the ARC gets full and then they go down again. So with this hack
>> I don't need nocache passed to nullfs and I don't need to limit
>> the max vnodes. Doing multiple untars in parallel over and over
>> doesn't seem to cause any issues for this test. I'm not saying
>> commenting out continue is the fix but a simple POC test.
>>
>
> I don't see an easy way to say "this is a nullfs vnode holding onto a
> zfs vnode". Perhaps the routine can be extrended with issuing a nullfs
> callback, if the module is loaded.
>
> In the meantime I think a good enough(tm) fix would be to check that
> nothing was freed and fallback to good old regular clean up without
> filtering by vfsops. This would be very similar to what you are doing
> with your hack.
>
Now that I wrote this perhaps an acceptable hack would be to extend
struct mount with a pointer to "lower layer" mount (if any) and patch
the vfsops check to also look there.
>
>> It appears that when ZFS is asking for cached vnodes to be
>> free'd nullfs also needs to free some up as well so that
>> they are free'd on the VFS level. It seems that vnlru_free_impl
>> should allow some of the related nullfs vnodes to be free'd so
>> the ZFS ones can be free'd and reduce the size of the ARC.
>>
>> BTW, I also hacked the kernel and mount to show the vnodes used
>> per mount ie. mount -v:
>> test on /test (zfs, NFS exported, local, nfsv4acls, fsid
>> 2b23b2a1de21ed66,
>> vnodes: count 13846 lazy 0)
>> /test on /mnt (nullfs, NFS exported, local, nfsv4acls, fsid
>> 11ff002929000000, vnodes: count 13846 lazy 0)
>>
>> Now I can easily see how the vnodes are used without going into ddb.
>> On my laptop I have various vnet jails and nullfs mount my homedir into
>> them so pretty much everything goes through nullfs to ZFS. I'm limping
>> along with the nullfs nocache and small number of vnodes but it would be
>> nice to not need that.
>>
>> Thanks,
>>
>> Doug A.
>>
>>
>
>
> --
> Mateusz Guzik <mjguzik gmail.com>
>
--
Mateusz Guzik <mjguzik gmail.com>