svn commit: r344188 - in head: lib/libc/sys sys/vm

Mon Feb 18 21:37:15 UTC 2019

  Bruce,

On Sun, Feb 17, 2019 at 02:58:43AM +1100, Bruce Evans wrote:
B> This is only a rollback for the vnode pager pbufs sub-pool.  Total
B> resource usage (preallocated kva and maximum on RAM that can be mapped
B> into this kva) is still about 5/2 times higher than before in my
B> configuration.  It would be 7/2 times higher if I configured fuse and
B> smbfs.  r343030 changed the allocation methods in all subsystems except
B> out-of-tree modules, and broke at least the KBI for these modules (*),
B> so it is easy to find the full expansion except for these modules by
B> looking at the diffs (I found the use in fuse and smbfs by grepping
B> some subtrees).  Also, the user's and vfs_bio's resource limit is still
B> broken by expanding it by this factor of 5/2 or more.
B> 
B> In the old allocation method, there was a single pool of pbufs of size
B> nswbuf which normally has its normal limiting value of 256, where this
B> magic 256 is hard-coded in vfs_bio.c, but if the user somehow knows
B> about this and the tunable kern.nswbuf, then it can be overridden.
B> The limit of 256 was documented in pbuf(9), but the tunable was never
B> documented AFAIK.  The variable nswbuf for this was documented in
B> pbuf(9).  The 256 entries are shared between any number of subsystems.
B> Most subsystems limited themselves to nswbuf/2 entries, and the man
B> page recommended this.  This gave overcommit by a factor of about 5/2
B> in my configuration (there are 7 subsystems, but some of these have a
B> smaller limit).
B> 
B> Now there each subsystem has a separate pool.  The size of the sub-pool
B> is still usually nswbuf / 2.  This gives overallocation by a factor of
B> about 5/2 in my configuration.  The overcommit only causes minor
B> performance problems.  2 subsystems might use all of the buffers, and
B> then all all the other subsystems have to wait, but it is rare for even
B> 2 subsystems to be under load at the same time.  It is more of a problem
B> that the limit is too small for a single subsystem.  The overallocation
B> gives worse problems such as crashing at boot time or a little later when
B> the user or auto-tuning has maxed out nswbuf.

This is all true and this is intentional. Before any two subsystems could
screw up the system by allocating all bufs. Now they are more separated
from each other and yes, the price is extra memory.

B> Now the user has to know even more arcane details to limit the kva, and
B> it is impossible to recover the old behaviour.  To get the old limit,
B> kern.nswbuf must be set to (256 * 2 / 5) in my configuration, but
B> that significantly decreases the number of buffers for each subsystem.
B> Users might already have set kern.nswbuf to a large value.  Since most
B> subsystems used to use half of that many buffers, the wastage from setting
B> it large for the benefit of 1 subsystem was at most a factor of 2.  Now
B> the limit can't be increased as much without running out of kva, and the
B> safe increase is more arcane and machine-dependent (starting with the
B> undocumented default being 8 times higher for 64-bit systems, but only
B> for 1 of the subsystems).

Yes, this sucks. If you can propose auto-tuning algorithm, I'm all for it.
For now I'm just shifting defaults from something that was reasonable in
2000 to something that is going to be reasonable in 2020.

-- 
Gleb Smirnoff