maxfiles, file table, descriptors, etc...
Kevin A. Pieckiel
kpieckiel-freebsd-hackers at smartrafficenter.org
Tue Apr 22 09:57:09 PDT 2003
On Mon, Apr 21, 2003 at 11:04:07AM -0700, Terry Lambert wrote:
> Things which are allocated by the zone allocator at interrupt
> time have a fixed amount of KVA that is set at boot time, before
> the VM system is fully up. Even if it were not fully up, the
> way it works is by preallocating an address space range to be
> later filled in by physical pages (you cannot call malloc() at
> interrupt time, but you can take a fault and fill in a backing
> page). So the zone size for sockets (inpcb's, tcpcb's) is fixed
> at boot time, even though it is derived from the "maxfiles".
This--plus the references to zalloci(), zalloc(), and malloc() you
gave--are starting to give me an understanding of this. At least,
I recognize the differences you're explaining as well as the logic
behind those differences. This is really starting to get
fascinating.
> A problem with the 5.x approach is that this means it's possible
> to get NULL returns from allocation routines, when the system is
> under memory pressure (because a mapping cannot be established),
> when certain of those routines are expected to *never* fail to
> obtain KVA space.
This is a bit unnerving--or so it would seem, though I'm a bit lost
on a couple points here. First, you said:
> In 5.x, the zone limits are still fixed to a static boot-time
> settable only value -- the same value -- but the actual zone
> allocations take place later."
Okay, so the basically the kernel is told it has a certain amount
of memory guaranteed to be available to it within a certain zone
when in fact that memory is not (because it's allocated later,
after a time when it may have already been allocated for another
purpose). I see how this links to your parenthetical statement:
> This
> is a serious problem, and has yet to be correctly addressed in
> the new allocator code (the problem occurs because the failure to
> obtain a mapping occurs before the zone in question hits its
> administrative limit).
What I fail to see is why this scheme is decidedly "better" than
that of the old memory allocator. I understand from the vm source
that uma wants to avoid allocating pools of unused memory for the
kernel--allocating memory on an as needed basis is a logical thing
to do. But losing the guarantee that the allocation routines will
not fail and not adjusting the calling functions of those routines
seems a bit dumb (since, as you state, the kernel panics). I think
this might be a trouble spot for me because of another question....
What is the correct way to address this in the new allocator code?
I can come up with an option or two on my own... such as that to
which I've already alluded: memory allocation routines that once
guaranteed success can no longer be used in such a manner, thus the
calling functions must be altered to take this into account. But
this is certainly not trivial!
And finally:
> Basically, everywhere that calls zalloci()
> is at risk of panic'ing under heavy load.
Am I not getting a point here? I can't find any reference to
zalloci() in the kernel source for 5.x (as of a 07 Apr 2003 cvs
update on HEAD), and such circumstances don't apply to 4.x (which,
of course, is where I DID find them after you mentioned them).
> Correct. The file descriptors are dynamically allocated; or rather,
> they are allocated incrementally, as needed, and since this is not
> at interrupt time, the standard system malloc() can be used.
A quick tangent.... when file descriptors are assigned and given to
a running program, are they guaranteed to start from zero (or three
if you don't close stdin, stdout, and stderr)? Or is this a byproduct
of implementation across the realm of Unixes?
> An interesting aside here is that the per process open file table,
> which holds references to file for the process, is actually
> allocated at power-of-2, meaning each time it needs to grow, the
> size is doubled, using realloc(), instead of malloc(), to keep the
> table allocation contiguous. This means if you use a lot of files,
> it takes exponentially increasing time to open new files, since
> realloc has to double the size, and then copy everything. For a
> few files, this is OK; for 100,000+ files (or network connections)
> in a single process, this starts to become a real source of overhead.
Now this _IS_ interesting. I would think circumstances requiring
100,000+ files or net connections, though not uncommon, are certainly
NOT in the vast majority, but would still have a bone to pick with this
implementation. For example, a web server--from which most users
expect (demand?) fast response time--that takes time to expand its
file table during a connection or request would seem to have
unreasonable response times. One would think there is a better way.
How much of an issue is this really? (Afterall, I probably wouldn't
have inquired about file limits, etc., in the first place if I wasn't
intending on implementing something that will require a lot of
connections.)
Excellent info, Terry. Thanks for sharing it!
Kevin
pos += screamnext[pos] /* does this goof up anywhere? */
-- Larry Wall in util.c from the perl source code
---
This message was signed by GnuPG. E-Mail kpieckiel-pgp at smartrafficenter.org
to receive my public key. You may also get my key from pgpkeys.mit.edu;
my ID is 0xF1604E92 and will expire on 01 January 2004.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20030422/3b37966e/attachment.bin
More information about the freebsd-hackers
mailing list