maxfiles, file table, descriptors, etc...

Terry Lambert tlambert2 at mindspring.com
Mon Apr 21 11:05:45 PDT 2003


"Kevin A. Pieckiel" wrote:
> > The "lsof" program will report open files.  The "maxfiles" variable
> > is a limit.  The limit is runtime for files, boot time for sockets.
> 
> Thank you, Terry, for your reply.
> 
> This is interesting.  This obviously shows my ignorance on how this is
> handled in the kernel, but how is it that one (files) is set at runtime
> while the other (sockets, pipes) is set at boot time only?

Things which are allocated by the zone allocator at interrupt
time have a fixed amount of KVA that is set at boot time, before
the VM system is fully up.  Even if it were not fully up, the
way it works is by preallocating an address space range to be
later filled in by physical pages (you cannot call malloc() at
interrupt time, but you can take a fault and fill in a backing
page).  So the zone size for sockets (inpcb's, tcpcb's) is fixed
at boot time, even though it is derived from the "maxfiles".

The short answer is "because that's how memory allocation works".

In 5.x, the zone limits are still fixed to a static boot-time
settable only value -- the same value -- but the actual zone
allocations take place later.  This is because the new allocator
is capable of creating page mappings at interrupt time, as well.
The resulting memory is type-stable (meaning once it is allocated,
it never changes type), but this makes the competition a little
more dynamic: OK for purpose, as long as you do not have initial
load spikes not related to the machines primary role.

A problem with the 5.x approach is that this means it's possible
to get NULL returns from allocation routines, when the system is
under memory pressure (because a mapping cannot be established),
when certain of those routines are expected to *never* fail to
obtain KVA space.  Thus you see a lot of people posting about
"Trap 12" panics in FreeBSD 5.x which can't happen on 4.x.  This
is a serious problem, and has yet to be correctly addressed in
the new allocator code (the problem occurs because the failure to
obtain a mapping occurs before the zone in question hits its
administrative limit).  Basically, everywhere that calls zalloci()
is at risk of panic'ing under heavy load.

> These apparently don't use the same mechanism in the kernel to
> keep up with file descriptors attached to a socket vs those
> attached to a file.

Correct.  The file descriptors are dynamically allocated; or rather,
they are allocated incrementally, as needed, and since this is not
at interrupt time, the standard system malloc() can be used.

An interesting aside here is that the per process open file table,
which holds references to file for the process, is actually
allocated at power-of-2, meaning each time it needs to grow, the
size is doubled, using realloc(), instead of malloc(), to keep the
table allocation contiguous.  This means if you use a lot of files,
it takes exponentially increasing time to open new files, since
realloc has to double the size, and then copy everything.  For a
few files, this is OK; for 100,000+ files (or network connections)
in a single process, this starts to become a real source of overhead.

> How does one tweak these boot time values?

You add an entry to /boot/loader.conf.  Type "man loader.conf"
for more information.  The name is the same as the sysctl name.


> If this is documented somewhere (I wasn't able to find any docs on this),
> I don't mind doing some reading to learn this (save for becoming intimate
> with the kernel code that handles this stuf--that's a little much for
> the time being).

It's not well documented.  In fact, the "man tuning" manual
pages fails to indicate a difference between boot time and
run time settings for the value.  Most people do not know how
zalloci() works, or what values affect the number of objects
that are allocated with one allocator or the other.

Because of this, you will often see bogus advice, like people
telling other people to use sysctl's to modify values that
have no effect on what they are actually trying to do, after
the boot has been completed.

The best way to deal with this is to read and understand the
uses of zalloci() vs. zalloc() vs. malloc() in the kernel source
code.

-- Terry


More information about the freebsd-hackers mailing list