Re: nullfs and ZFS issues

From: Doug Ambrisko <ambrisko_at_ambrisko.com>
Date: Fri, 22 Apr 2022 14:36:22 UTC
On Fri, Apr 22, 2022 at 09:04:39AM +0200, Alexander Leidinger wrote:
| Quoting Doug Ambrisko <ambrisko@ambrisko.com> (from Thu, 21 Apr 2022  
| 09:38:35 -0700):
| 
| > On Thu, Apr 21, 2022 at 03:44:02PM +0200, Alexander Leidinger wrote:
| > | Quoting Mateusz Guzik <mjguzik@gmail.com> (from Thu, 21 Apr 2022
| > | 14:50:42 +0200):
| > |
| > | > On 4/21/22, Alexander Leidinger <Alexander@leidinger.net> wrote:
| > | >> I tried nocache on a system with a lot of jails which use nullfs,
| > | >> which showed very slow behavior in the daily periodic runs (12h runs
| > | >> in the night after boot, 24h or more in subsequent nights). Now the
| > | >> first nightly run after boot was finished after 4h.
| > | >>
| > | >> What is the benefit of not disabling the cache in nullfs? I would
| > | >> expect zfs (or ufs) to cache the (meta)data anyway.
| > | >>
| > | >
| > | > does the poor performance show up with
| > | > https://people.freebsd.org/~mjg/vnlru_free_pick.diff ?
| > |
| > | I would like to have all the 22 jails run the periodic scripts a
| > | second night in a row before trying this.
| > |
| > | > if the long runs are still there, can you get some profiling from it?
| > | > sysctl -a before and after would be a start.
| > | >
| > | > My guess is that you are the vnode limit and bumping into the 1  
| > second sleep.
| > |
| > | That would explain the behavior I see since I added the last jail
| > | which seems to have crossed a threshold which triggers the slow
| > | behavior.
| > |
| > | Current status (with the 112 nullfs mounts with nocache):
| > | kern.maxvnodes:               10485760
| > | kern.numvnodes:                3791064
| > | kern.freevnodes:               3613694
| > | kern.cache.stats.heldvnodes:    151707
| > | kern.vnodes_created:         260288639
| > |
| > | The maxvnodes value is already increased by 10 times compared to the
| > | default value on this system.
| >
| > I've attached mount.patch that when doing mount -v should
| > show the vnode usage per filesystem.  Note that the problem I was
| > running into was after some operations arc_prune and arc_evict would
| > consume 100% of 2 cores and make ZFS really slow.  If you are not
| > running into that issue then nocache etc. shouldn't be needed.
| 
| I don't run into this issue, but I have a huge perf difference when  
| using nocache in the nightly periodic runs. 4h instead of 12-24h (22  
| jails on this system).

I wouldn't do the nocache then!  It would be good to see what
Mateusz patch does without nocache for your env.
 
| > On my laptop I set ARC to 1G since I don't use swap and in the past
| > ARC would consume to much memory and things would die.  When the
| > nullfs holds a bunch of vnodes then ZFS couldn't release them.
| >
| > FYI, on my laptop with nocache and limited vnodes I haven't run
| > into this problem.  I haven't tried the patch to let ZFS free
| > it's and nullfs vnodes on my laptop.  I have only tried it via
| 
| I have this patch and your mount patch installed now, without nocache  
| and reduced arc reclaim settings (100, 1). I will check the runtime  
| for the next 2 days.
| 
| Your mount patch to show the per mount vnodes count looks useful, not  
| only for this particular case. Do you intend to commit it?

I should since it doesn't change the size of the structure etc.  I need
to put it up for review.

Thanks,

Doug A.