svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm

Bruce Evans brde at optusnet.com.au
Thu Feb 14 07:56:49 UTC 2019


On Wed, 13 Feb 2019, Justin Hibbits wrote:

> On Tue, 15 Jan 2019 01:02:17 +0000 (UTC)
> Gleb Smirnoff <glebius at FreeBSD.org> wrote:
>
>> Author: glebius
>> Date: Tue Jan 15 01:02:16 2019
>> New Revision: 343030
>> URL: https://svnweb.freebsd.org/changeset/base/343030
>>
>> Log:
>>   Allocate pager bufs from UMA instead of 80-ish mutex protected
>> linked list.
> ...
>
> This seems to break 32-bit platforms, or at least 32-bit book-e
> powerpc, which has a limited KVA space (~500MB).  It preallocates I've
> seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA,
> leaving very little left for the rest of runtime.

Hrmph.  I complained other things in this commit this when it was
committed, but not this largest bug since preallocation was broken then
so I thought that it wasn't done, so that problems are smaller unless the
excessive limits are actually reached.

Now i386 does it:

XX ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
XX 
XX swrbuf:                 336,    128,       0,       0,       0,   0,   0
XX swwbuf:                 336,     64,       0,       0,       0,   0,   0
XX nfspbuf:                336,    128,       0,       0,       0,   0,   0
XX mdpbuf:                 336,     25,       0,       0,       0,   0,   0
XX clpbuf:                 336,    128,       0,       5,       4,   0,   0
XX vnpbuf:                 336,   2048,       0,       0,       0,   0,   0
XX pbuf:                   336,     16,       0,    2535,       0,   0,   0

but i386 now has 4GB of KVA, with almost 3GB to waste, so the bug is not
noticed there.

The preallocation wasn't there in my last mail to the author about nearby
bugs, on 24 Jan 2019:

YY vnpbuf:                 568,   2048,       0,       0,       0,   0,   0
YY clpbuf:                 568,    128,       0,     128,    8750,   0,   1
YY pbuf:                   568,     16,       0,       4,       0,   0,   0

This output is on amd64 where the SIZE is larger and everything else was
the same as on i386.  Now amd64 shows the large preallocation too.

There seems to be another bug for the especially small LIMIT of 16 to
turn into a preallocation of 2535 and not cause immediate reduction to
the limit.

I happen to have kernels from 24 and 25 Jan handy.  The first one is
amd64 r343346M built on Jan 23, and it doesn't do the large
preallocation.  The second one is i386 r343388:343418M built on Jan
25, and it does the large preallocation.  Both call uma_prealloc() to
ask for nswbuf_max = 0x9e9 buffers, but the old version only allocates
4 buffers while later version allocate 0x9e9 buffers.

The only relevant commit between the good and bad versions seems to be
r343453.  This fixes uma_prealloc() to actually work.  But it is a feature
for it to not work when its caller asks for too much.

0x9e9 is the sum of the LIMITs of all pbuf pools.  The main bug in
r343030 is that it expands nswbuf, which is supposed to give the
combined limit, from its normal value of 256 to 0x9e9.  (r343030
actually used nswbuf before it was properly initialized, so used its
maximum value of 256 even on small systems with nswbuf = 16.  Only
this has been fixed.)

On i386, nbuf is excessively limited so as to give a maxbufspace of
about 100MB so as to fit in 1GB of kva even with infinite RAM and
-current's actual 4GB of kva.  nbuf is correctly limited to give a
much smaller maxbufspace when RAM is small (kva scaling for this is
not done so well).  nswbuf is restricted if nbuf is restricted, but
not enough (except in my version).  It is normally 256, so the pbuf
allocation used to be 32MB, and this is already a bit large compared
with 100MB for maxbufspace.  Expanding pbufs by a factor of 0x9e9/0x100
gives the silly combination of 100MB for maxbufspace and 317MB for
pbufs.

If kva is only 512MB instead of 1GB, then maxbufspace should be only
50MB and nswbuf should be smaller too.  Similarly for PAE on i386 back
when it was configured with 1GB kva by default.  Only about 512MB are
left after allocating space for page table metadata.  I have fixes
that scale most of this better.  Large subsystems starting with kmem
get a hard-coded fraction of the usable kva.  E.g., kmem gets about
60% of usable kva instead of about 40% of nominal kva.  Most other
large subsystems including the buffer cache get about 1/8 of the
remaining 40% of usable kva.  Scaling for other subsystems is mostly
worse than for kmem.  pbufs are part of the buffer cache allocation.
The expansion factor of 0x9e9/0x100 breaks this.

I don't understand how pbuf_preallocate() allocates for the other
pbuf pools.  When I debugged this for clpbufs, the preallocation was
not used.  pbuf types other than clpbufs seem to be unused in my
configurations.  I thought that pbufs were used during initialization,
since they end up with a nonzero FREE count, but their only use seems
to be to preallocate them.

Bruce


More information about the svn-src-head mailing list