svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm
markj at freebsd.org
Thu Feb 14 16:12:00 UTC 2019
On Thu, Feb 14, 2019 at 06:56:42PM +1100, Bruce Evans wrote:
> On Wed, 13 Feb 2019, Justin Hibbits wrote:
> > On Tue, 15 Jan 2019 01:02:17 +0000 (UTC)
> > Gleb Smirnoff <glebius at FreeBSD.org> wrote:
> >> Author: glebius
> >> Date: Tue Jan 15 01:02:16 2019
> >> New Revision: 343030
> >> URL: https://svnweb.freebsd.org/changeset/base/343030
> >> Log:
> >> Allocate pager bufs from UMA instead of 80-ish mutex protected
> >> linked list.
> > ...
> > This seems to break 32-bit platforms, or at least 32-bit book-e
> > powerpc, which has a limited KVA space (~500MB). It preallocates I've
> > seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA,
> > leaving very little left for the rest of runtime.
> Hrmph. I complained other things in this commit this when it was
> committed, but not this largest bug since preallocation was broken then
> so I thought that it wasn't done, so that problems are smaller unless the
> excessive limits are actually reached.
> Now i386 does it:
> XX ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP
> XX swrbuf: 336, 128, 0, 0, 0, 0, 0
> XX swwbuf: 336, 64, 0, 0, 0, 0, 0
> XX nfspbuf: 336, 128, 0, 0, 0, 0, 0
> XX mdpbuf: 336, 25, 0, 0, 0, 0, 0
> XX clpbuf: 336, 128, 0, 5, 4, 0, 0
> XX vnpbuf: 336, 2048, 0, 0, 0, 0, 0
> XX pbuf: 336, 16, 0, 2535, 0, 0, 0
> but i386 now has 4GB of KVA, with almost 3GB to waste, so the bug is not
> noticed there.
> The preallocation wasn't there in my last mail to the author about nearby
> bugs, on 24 Jan 2019:
> YY vnpbuf: 568, 2048, 0, 0, 0, 0, 0
> YY clpbuf: 568, 128, 0, 128, 8750, 0, 1
> YY pbuf: 568, 16, 0, 4, 0, 0, 0
> This output is on amd64 where the SIZE is larger and everything else was
> the same as on i386. Now amd64 shows the large preallocation too.
> There seems to be another bug for the especially small LIMIT of 16 to
> turn into a preallocation of 2535 and not cause immediate reduction to
> the limit.
> I happen to have kernels from 24 and 25 Jan handy. The first one is
> amd64 r343346M built on Jan 23, and it doesn't do the large
> preallocation. The second one is i386 r343388:343418M built on Jan
> 25, and it does the large preallocation. Both call uma_prealloc() to
> ask for nswbuf_max = 0x9e9 buffers, but the old version only allocates
> 4 buffers while later version allocate 0x9e9 buffers.
> The only relevant commit between the good and bad versions seems to be
> r343453. This fixes uma_prealloc() to actually work. But it is a feature
> for it to not work when its caller asks for too much.
I guess you meant r343353. In any case, the pbuf keg is _NOFREE, so
even without preallocation the large pbuf zone limits may become
problematic if there are bursts of allocation requests.
> 0x9e9 is the sum of the LIMITs of all pbuf pools. The main bug in
> r343030 is that it expands nswbuf, which is supposed to give the
> combined limit, from its normal value of 256 to 0x9e9. (r343030
> actually used nswbuf before it was properly initialized, so used its
> maximum value of 256 even on small systems with nswbuf = 16. Only
> this has been fixed.)
> On i386, nbuf is excessively limited so as to give a maxbufspace of
> about 100MB so as to fit in 1GB of kva even with infinite RAM and
> -current's actual 4GB of kva. nbuf is correctly limited to give a
> much smaller maxbufspace when RAM is small (kva scaling for this is
> not done so well). nswbuf is restricted if nbuf is restricted, but
> not enough (except in my version). It is normally 256, so the pbuf
> allocation used to be 32MB, and this is already a bit large compared
> with 100MB for maxbufspace. Expanding pbufs by a factor of 0x9e9/0x100
> gives the silly combination of 100MB for maxbufspace and 317MB for
> If kva is only 512MB instead of 1GB, then maxbufspace should be only
> 50MB and nswbuf should be smaller too. Similarly for PAE on i386 back
> when it was configured with 1GB kva by default. Only about 512MB are
> left after allocating space for page table metadata. I have fixes
> that scale most of this better. Large subsystems starting with kmem
> get a hard-coded fraction of the usable kva. E.g., kmem gets about
> 60% of usable kva instead of about 40% of nominal kva. Most other
> large subsystems including the buffer cache get about 1/8 of the
> remaining 40% of usable kva. Scaling for other subsystems is mostly
> worse than for kmem. pbufs are part of the buffer cache allocation.
> The expansion factor of 0x9e9/0x100 breaks this.
> I don't understand how pbuf_preallocate() allocates for the other
> pbuf pools. When I debugged this for clpbufs, the preallocation was
> not used. pbuf types other than clpbufs seem to be unused in my
> configurations. I thought that pbufs were used during initialization,
> since they end up with a nonzero FREE count, but their only use seems
> to be to preallocate them.
All of the pbuf zones share a common slab allocator. The zones have
individual limits but can tap in to the shared preallocation.
More information about the svn-src-all