maxfiles, file table, descriptors, etc...
tlambert2 at mindspring.com
Tue Apr 22 00:04:12 PDT 2003
Bill Wells wrote:
> > An interesting aside here is that the per process open file table,
> > which holds references to file for the process, is actually
> > allocated at power-of-2, meaning each time it needs to grow, the
> > size is doubled, using realloc(), instead of malloc(), to keep the
> > table allocation contiguous. This means if you use a lot of files,
> > it takes exponentially increasing time to open new files, since
> > realloc has to double the size, and then copy everything. For a
> > few files, this is OK; for 100,000+ files (or network connections)
> > in a single process, this starts to become a real source of overhead.
> Let's say you just allocated the 9th element. You started out with
> a table of one element, doubled it (copying 1), doubled it to 4
> elements (copying 2 + 1 or 3), thence to 8 (copying 4 + 3 or 7)
> and finally to 16 (copying 8 + 7 or 15).
> To allocate then 2**n + 1'th entry, you will have to do 2**(n+1)
> copies. Thus you will do (2**(n+1) - 1) / (2**n + 1) copies per
> allocated entry. This is the worst case, since additional
> allocations up to 2**(n+1) don't incur any additional copies.
> The upper bound is two entries copied per entry allocated. This
> makes exponential allocation a *linear time* algorithm. This just
> isn't a major time cost unless the chunkiness of the time spent is
> an issue.
The copy time goes up as 2^N; the rate of copies required goes
up as log2(N).
Overall, it's true that the time, scaled linearly against N,
and averaged, is a linear increase (e.g. multiply the two,
and then treat as a continuous function, even though it's not).
But I was more interested in instaneous response.
Consider that the latency involved in the copy increases the
delay for all other pending operations. For example, on an
HTTP server, copying the file results in bursty response.
Also realize that if you are under a "flash crowd" situation,
this is *exactly* the point in time you *don't* want to increase
your response latency, as requests will continue to pile up (e.g.
requests as a result of being "slashdotted").
The obvious workaround is to dup2() something like /dev/null to
fd 200,000, and grow the table once, up front. Still, it's a
pain in the butt that the default algorithm takes more time at
exactly the point where you don't have time available. And the
"grow it once" approach has the drawback that that memory is
now committed to a particular task, when it may turn out to be
better utilized elsewhere, until load merits it.
It's tempting to come up with a way to register fd's into a
kqueue for incoming events, without adding a real per-process
open file table entry for them (especiall for sockets), and
avoid the overhead altogether.
Of course, for something like that, struct fileops must die.
More information about the freebsd-hackers