svn commit: r251282 - head/sys/kern
Alfred Perlstein
bright at mu.org
Fri Jun 7 20:02:01 UTC 2013
On 6/5/13 10:55 AM, Adrian Chadd wrote:
> ... can people please boot an i386 kernel w/ this stuff in it, with
> the RAM constrained to something small (say, 128mb), and verify that
> you can at least start a buildworld (minus clang, of course) without
> panicing?
>
> There's a recent PR
> (http://www.freebsd.org/cgi/query-pr.cgi?pr=179112) which shows
> regressions on i386 + small RAM platforms.
>
> I know it's a different area, but I'd really appreciate it if people
> better tested this stuff out before they make i386 (and by extension,
> any platform with limited KVA and small amounts of RAM) unusable
> through carelessness.
Adrian, I'm pretty sure that kib's patch actually limits memory.
The concern I have is that it's a hard limit that appears not based on
KVA, but rather 32GB.
I can be wrong, but Bruce's feedback looks like it backs up my concern.
-Alfred
>
> Thanks,
>
>
>
> Adrian
>
> On 5 June 2013 05:11, Alfred Perlstein <bright at mu.org> wrote:
>> Konstantin, is there any way to take some of Bruce's feedback into account
>> to get rid of the hard coded max?
>>
>> -Alfred
>>
>>
>> On 6/4/13 1:14 AM, Bruce Evans wrote:
>>> On Tue, 4 Jun 2013, Konstantin Belousov wrote:
>>>
>>>> On Mon, Jun 03, 2013 at 02:24:26AM -0700, Alfred Perlstein wrote:
>>>>> On 6/3/13 12:55 AM, Konstantin Belousov wrote:
>>>>>> On Sun, Jun 02, 2013 at 09:27:53PM -0700, Alfred Perlstein wrote:
>>>>>>> Hey Konstaintin, shouldn't this be scaled against the actual amount of
>>>>>>> KVA we have instead of an arbitrary limit?
>>>>>> The commit changes the buffer cache to scale according to the available
>>>>>> KVA, making the scaling less dumb.
>>>>>>
>>>>>> I do not understand what exactly do you want to do, please describe the
>>>>>> algorithm you propose to implement instead of my change.
>>>>>
>>>>> Sure, how about deriving the hardcoded "32" from the maxkva a machine
>>>>> can have?
>>>>>
>>>>> Is that possible?
>>>> I do not see why this would be useful. Initially I thought about simply
>>>> capping nbuf at 100000 without referencing any "memory". Then I realized
>>>> that this would somewhat conflict with (unlikely) changes to the value
>>>> of BKVASIZE due to "factor".
>>>
>>> The presence of BKVASIZE in 'factor' is a bug. My version never had this
>>> bug (see below for a patch). The scaling should be to maximize nbuf,
>>> subject to non-arbitrary limits on physical memory and kva, and now an
>>> arbirary limit of about 100000 / (BKVASIZE / 16384) on nbuf. Your new
>>> limit is arbitrary so it shouldn't affect nbuf depending on BKVASIZE.
>>>
>>> Expanding BKVASIZE should expand kva use, but on i386 this will soon
>>> hit a non-arbitary kva limit so nbuf will not be as high as preferred.
>>> nbuf needs to be very large mainly to support file systems with small
>>> buffers. Even 100000 only gives 50MB of buffering if the fs block
>>> size is 512. This would shrink to only 12.5MB if BKVASIZE is expanded
>>> by a factor of 4 and the bug is not fixed. If 25000 buffers after
>>> expanding BKVASIZE is enough, then that should be the arbitary limit
>>> (independent of BKVASIZE) so as to save physical memory.
>>>
>>> On i386 systems with 1GB RAM, nbuf defaults to about 7000. With an
>>> fs block size of 512, that can buffer 3.5MB. Expanding BKVASIZE by a
>>> factor of 4 shrinks this to 0.875MB in -current. That is ridiculously
>>> small. VMIO limits the lossage from this.
>>>
>>> BKVASIZE was originally 8KB. I forget if nbuf was halved by not modifying
>>> the scale factor when it was expanded to 16KB. Probably not. I used to
>>> modify the scale factor to get twice as many as the default nbuf, but
>>> once the default nbuf expanded to a few thousand it became large enough
>>> for most purposes so I no longer do this.
>>>
>>> Except on arches with extremely limit kva like i386, KVASIZE should be
>>> MAXBSIZE, and all of the complications for expanding a buffer's kva
>>> beyond BKVASIZE and handling the fragmentation problems caused by this
>>> shouldn't exist. Limits like vfs.maxbufspace should be scaled by
>>> NOMBSIZE = current BKVASIZE instead of BKVASIZE. Oops, my NOMBSIZE
>>> has similar logical problems to BKVASIZE. It needs to be precisely
>>> 16KB to preserve the defaults for nbuf and maxbufspace, but the nominal
>>> (ffs) buffer size is now 32KB. So the heuristic scale factors should
>>> use the historical magic number 16K. I changed sequential_heuristic()
>>> to use a hard-coded 16K instead of BKVASIZE. The correct number here
>>> depends on disk technology.
>>>
>>> The patch has many style fixes:
>>>
>>> @ Index: vfs_bio.c
>>> @ ===================================================================
>>> @ RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v
>>> @ retrieving revision 1.436
>>> @ diff -u -2 -r1.436 vfs_bio.c
>>> @ --- vfs_bio.c 17 Jun 2004 17:16:49 -0000 1.436
>>> @ +++ vfs_bio.c 3 Jun 2013 16:04:34 -0000
>>> @ @@ -419,64 +493,54 @@
>>> @ @ /*
>>> @ - * Calculating buffer cache scaling values and reserve space for buffer
>>> @ + * Calculate buffer cache scaling values and reserve space for buffer
>>> @ * headers. This is called during low level kernel initialization and
>>> @ * may be called more then once. We CANNOT write to the memory area
>>> @ * being reserved at this time.
>>> @ */
>>> @ -caddr_t
>>> @ -kern_vfs_bio_buffer_alloc(caddr_t v, long physmem_est)
>>> @ +void *
>>> @ +vfs_bio_alloc(void *va)
>>>
>>> The API name was verbose and bogus. The prefix for this subsystem is
>>> vfs_bio_, not kern_vfs_bio_. This is a generic allocation routine.
>>> It always allocated swbufs and has expanded to do more allocations.
>>>
>>> @ {
>>> @ - /*
>>> @ - * physmem_est is in pages. Convert it to kilobytes (assumes
>>> @ - * PAGE_SIZE is >= 1K)
>>> @ - */
>>> @ - physmem_est = physmem_est * (PAGE_SIZE / 1024);
>>>
>>> I use the global physmem. This change may be too i386-specific. In
>>> with 8-16 RAM, a significant amount of RAM may be eaten by the kernel
>>> image or perhaps just unavailable to the kernel. Now memories are
>>> larger and the amount eaten is relatively insignificant (since we are
>>> only using a small fraction of physmem, only the relative error matters).
>>>
>>> @ + int factor, memsize;
>>> @ @ /*
>>> @ - * The nominal buffer size (and minimum KVA allocation) is
>>> BKVASIZE.
>>> @ - * For the first 64MB of ram nominally allocate sufficient buffers
>>> to
>>> @ - * cover 1/4 of our ram. Beyond the first 64MB allocate additional
>>> @ - * buffers to cover 1/20 of our ram over 64MB. When auto-sizing
>>> @ - * the buffer cache we limit the eventual kva reservation to
>>> @ + * The nominal buffer size is NOMBSIZE.
>>> @ + * For the first 4MB of RAM, allocate 50 buffers.
>>> @ + * For the next 60MB of RAM, nominally allocate sufficient buffers
>>> to
>>> @ + * cover 1/4 of the RAM. Beyond the first 64MB allocate additional
>>> @ + * buffers to cover 1/20 of the RAM. When auto-sizing
>>> @ + * the buffer cache, limit the eventual kva reservation to
>>> @ * maxbcache bytes.
>>>
>>> Fix old bitrot in this comment, and update for my changes.
>>> - the first 4MB wasn't mentioned
>>> - the minimum clamp wasn't mentioned
>>> - replace BKVASIZE by NOMBSIZE so that I can change BKVASIZE without
>>> changing the default nbuf (subject to maxbcache).
>>>
>>> The bitrot of the magic number 1/20 is not fixed. The code uses 1/10.
>>> This is fixed in -current. I should have noticed this before, since
>>> I used to have to change the code to get 1/10 instead of 1/20.
>>>
>>> I hope you updated the comment for recent changes, but I don't like
>>> having magic numbers in comments.
>>>
>>> @ *
>>> @ - * factor represents the 1/4 x ram conversion.
>>> @ + * `factor' represents the 1/4 of RAM conversion.
>>> @ */
>>> @ +#define NOMBSIZE imax(PAGE_SIZE, 16384) /* XXX */
>>> @ if (nbuf == 0) {
>>> @ - int factor = 4 * BKVASIZE / 1024;
>>> @ + /*
>>> @ + * physmem is in pages. Convert it to a size in kilobytes.
>>> @ + * XXX no ptob().
>>> @ + */
>>> @ + memsize = ctob(physmem) / 1024;
>>> @ @ + factor = 4 * NOMBSIZE / 1024;
>>> @ nbuf = 50;
>>> @ - if (physmem_est > 4096)
>>> @ - nbuf += min((physmem_est - 4096) / factor,
>>> @ - 65536 / factor);
>>> @ - if (physmem_est > 65536)
>>> @ - nbuf += (physmem_est - 65536) * 2 / (factor * 5);
>>> @ -
>>> @ - if (maxbcache && nbuf > maxbcache / BKVASIZE)
>>> @ - nbuf = maxbcache / BKVASIZE;
>>>
>>> The maxbcache limit must be applied to the non-default nbuf too.
>>>
>>> @ + if (memsize > 4096)
>>> @ + nbuf += imin((memsize - 4096) / factor,
>>> @ + (65536 - 4096) / factor);
>>> @ + if (memsize > 65536)
>>> @ + nbuf += (memsize - 65536) * 2 / (factor * 5);
>>> @ }
>>> @ -
>>> @ -#if 0
>>> @ - /*
>>> @ - * Do not allow the buffer_map to be more then 1/2 the size of the
>>> @ - * kernel_map.
>>> @ - */
>>> @ - if (nbuf > (kernel_map->max_offset - kernel_map->min_offset) / @ -
>>> (BKVASIZE * 2)) {
>>> @ - nbuf = (kernel_map->max_offset - kernel_map->min_offset) / @ -
>>> (BKVASIZE * 2);
>>> @ - printf("Warning: nbufs capped at %d\n", nbuf);
>>> @ - }
>>> @ -#endif
>>>
>>> Already removed in -current.
>>>
>>> @ + if (nbuf > maxbcache / BKVASIZE)
>>> @ + nbuf = maxbcache / BKVASIZE;
>>> @ @ /*
>>> @ * swbufs are used as temporary holders for I/O, such as paging
>>> I/O.
>>> @ - * We have no less then 16 and no more then 256.
>>> @ + * There must be between 16 and 256 of them.
>>> @ */
>>> @ - nswbuf = max(min(nbuf/4, 256), 16);
>>> @ + nswbuf = imax(imin(nbuf / 4, 256), 16);
>>> @ #ifdef NSWBUF_MIN
>>> @ if (nswbuf < NSWBUF_MIN)
>>> @ nswbuf = NSWBUF_MIN;
>>> @ #endif
>>> @ +
>>> @ #ifdef DIRECTIO
>>> @ ffs_rawread_setup();
>>> @ @@ -484,12 +548,12 @@
>>> @ @ /*
>>> @ - * Reserve space for the buffer cache buffers
>>> @ + * Reserve space for swbuf headers and buffer cache buffers.
>>> @ */
>>> @ - swbuf = (void *)v;
>>> @ - v = (caddr_t)(swbuf + nswbuf);
>>> @ - buf = (void *)v;
>>> @ - v = (caddr_t)(buf + nbuf);
>>> @ + swbuf = va;
>>> @ + va = swbuf + nswbuf;
>>> @ + buf = va;
>>> @ + va = buf + nbuf;
>>> @ @ - return(v);
>>> @ + return (va);
>>> @ }
>>> @
>>>
>>> Don't use the caddr_t mistake. The API was also changed to not use it.
>>>
>>> Bruce
>>>
More information about the svn-src-all
mailing list