[PATCH]/[RFC] Increase scalability of per-process file descriptor
data structures
Tim Prouty
tim.prouty at isilon.com
Tue May 11 17:36:36 UTC 2010
Hi,
This is my first time sending a patch to the list, so let me know if
there
are any conventions I missed.
Attached is a patch that attempts to remove the data structure
limitations on the number of open file descriptors in the system. The
patch is against our modified version of FreeBSD 7, so it probably
won't apply cleanly against upstream, but I wanted to get this out
there for discussion soon so if there is feedback, we can address it
and then worry about porting a specific patch for upstream.
We (Isilon) have been running this internally for a few months without
any issues, although there is at least one known issue that I need to
resolve, which is mentioned below.
Motivation:
With the increasing amount of memory and processing power in modern
machines, there are certain userspace processes that are able to
handle much higher concurrent load than previously possible. A
specific example is a single-process/multi-threaded SMB stack which
can handle thousands of connected clients, each with hundreds of
files open. Once kernel sysctl limits are increased for max files,
the next limitation is in the actual actual file descriptor data
structures.
Problem - Data Structure Limits:
The existing per-process data structures for the file descriptor are
flat tables, which are reallocated each time they need need to grow.
This is innefficient as the amount of data to allocate and copy each
time increases, but the bigger issue is the potentially limited
amount of contiguous KVA memory as the table grows very large. Over
time as the KVA memory becomes fragmanted, malloc may be unable to
provide large enough blocks of contiguous memory.
In the current code the struct proc contains both an array of struct
file pointers and a bit field indicating which file descriptors are
in use. The primary issue is how to handle these structures growing
beyond the kernel page size of 4K.
The array of file pointers will grow much faster than the bit field,
especially on a 64 bit kernel. The 4K block size will be hit at 512
files (64 bit kernel) for the file pointer array and 32,768 files
for the bit field.
Solution:
File Pointer Array
Focusing first on the file pointer array limitation, an indirect
block approach is used. An indirect block size of 4K (equal to page
size) is used, allowing for 512 files per block. To optimize for
the common case of low/normal fd usage, a flat array is initialized
to 20 entries and then grows at 2x each time until the block reaches
it's maximum size. Once more than 512 files are opened, the array
will transition to a single level indirect block table.
Fd Bitfield:
The fd bit field as it stands can represent 32K files when it grows
to the page size limit. Using the same indirect system as the file
pointer array, it is able to grow beyond it's existing limits.
Close Exec Field:
One complication of the old file pointer table is that for each file
pointer there was 1 byte flags. The memory was laid out such that
the file pointers are all in one contiguous array, followed by a
second array of chars where each char entry is a flags field that
corresponds to the file pointer at the same index. Interestingly
there is actually only one flag that is used: UF_EXCLOSE, so it's
fairly wasteful to have an array of chars. What linux does, and
what I have done is to just use a bitfield for all fds that should
be closed on exec. This could be further optimized by doing some
pointer trickery to store the close exec bit in the struct file
pointer rather than keep a separate bitfield.
Indirect Block Table:
Since there are three consumers of the indirect block table, I
generalized it so all of the consumers rely on the same code. This
could eventually be refactored into a kernel library since it could
be generally useful in other areas. The table uses a single level
of indirection, so the base table can still grow beyond the 4K. As
a process uses more fds, the need to continue growing the base table
should be fairly limited, and a single realloc will significantly
increase the number of fds the process can allocate.
Accessing the new data structures:
All consumers of the file pointer array and bitfield will now have
to use accessors rather than using direct access.
Known Issues:
The new fdp locking in fdcopy needs to be reworked.
Thank you for reviewing!
-Tim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Increase-scalabilty-of-per-process-file-descriptor-d.patch
Type: application/octet-stream
Size: 51739 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20100511/4b9930cf/0001-Increase-scalabilty-of-per-process-file-descriptor-d.obj
-------------- next part --------------
More information about the freebsd-arch
mailing list