[PATCH]/[RFC] Increase scalability of per-process file descriptor data structures

Tim Prouty tim.prouty at isilon.com
Tue May 11 17:36:36 UTC 2010


Hi,

This is my first time sending a patch to the list, so let me know if  
there
are any conventions I missed.

Attached is a patch that attempts to remove the data structure
limitations on the number of open file descriptors in the system.  The
patch is against our modified version of FreeBSD 7, so it probably
won't apply cleanly against upstream, but I wanted to get this out
there for discussion soon so if there is feedback, we can address it
and then worry about porting a specific patch for upstream.

We (Isilon) have been running this internally for a few months without
any issues, although there is at least one known issue that I need to
resolve, which is mentioned below.

Motivation:

   With the increasing amount of memory and processing power in modern
   machines, there are certain userspace processes that are able to
   handle much higher concurrent load than previously possible.  A
   specific example is a single-process/multi-threaded SMB stack which
   can handle thousands of connected clients, each with hundreds of
   files open.  Once kernel sysctl limits are increased for max files,
   the next limitation is in the actual actual file descriptor data
   structures.

Problem - Data Structure Limits:

   The existing per-process data structures for the file descriptor are
   flat tables, which are reallocated each time they need need to grow.
   This is innefficient as the amount of data to allocate and copy each
   time increases, but the bigger issue is the potentially limited
   amount of contiguous KVA memory as the table grows very large.  Over
   time as the KVA memory becomes fragmanted, malloc may be unable to
   provide large enough blocks of contiguous memory.

   In the current code the struct proc contains both an array of struct
   file pointers and a bit field indicating which file descriptors are
   in use.  The primary issue is how to handle these structures growing
   beyond the kernel page size of 4K.

   The array of file pointers will grow much faster than the bit field,
   especially on a 64 bit kernel. The 4K block size will be hit at 512
   files (64 bit kernel) for the file pointer array and 32,768 files
   for the bit field.

Solution:

File Pointer Array

   Focusing first on the file pointer array limitation, an indirect
   block approach is used.  An indirect block size of 4K (equal to page
   size) is used, allowing for 512 files per block.  To optimize for
   the common case of low/normal fd usage, a flat array is initialized
   to 20 entries and then grows at 2x each time until the block reaches
   it's maximum size. Once more than 512 files are opened, the array
   will transition to a single level indirect block table.

Fd Bitfield:

   The fd bit field as it stands can represent 32K files when it grows
   to the page size limit.  Using the same indirect system as the file
   pointer array, it is able to grow beyond it's existing limits.

Close Exec Field:

   One complication of the old file pointer table is that for each file
   pointer there was 1 byte flags.  The memory was laid out such that
   the file pointers are all in one contiguous array, followed by a
   second array of chars where each char entry is a flags field that
   corresponds to the file pointer at the same index.  Interestingly
   there is actually only one flag that is used: UF_EXCLOSE, so it's
   fairly wasteful to have an array of chars.  What linux does, and
   what I have done is to just use a bitfield for all fds that should
   be closed on exec.  This could be further optimized by doing some
   pointer trickery to store the close exec bit in the struct file
   pointer rather than keep a separate bitfield.

Indirect Block Table:

   Since there are three consumers of the indirect block table, I
   generalized it so all of the consumers rely on the same code.  This
   could eventually be refactored into a kernel library since it could
   be generally useful in other areas.  The table uses a single level
   of indirection, so the base table can still grow beyond the 4K.  As
   a process uses more fds, the need to continue growing the base table
   should be fairly limited, and a single realloc will significantly
   increase the number of fds the process can allocate.

Accessing the new data structures:

   All consumers of the file pointer array and bitfield will now have
   to use accessors rather than using direct access.

Known Issues:

   The new fdp locking in fdcopy needs to be reworked.


Thank you for reviewing!

-Tim

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Increase-scalabilty-of-per-process-file-descriptor-d.patch
Type: application/octet-stream
Size: 51739 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20100511/4b9930cf/0001-Increase-scalabilty-of-per-process-file-descriptor-d.obj
-------------- next part --------------



More information about the freebsd-arch mailing list