[PATCH]/[RFC] Increase scalability of per-process file descriptor data structures

Tim Prouty tim.prouty at isilon.com
Tue May 11 23:13:47 UTC 2010


The patch was slightly truncated, I'm guessing because it was > 50K.
Attached is a slightly trimmed down patch.

-Tim

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fd_scalability.patch
Type: application/octet-stream
Size: 44180 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20100511/34841690/fd_scalability-0001.obj
-------------- next part --------------



On May 11, 2010, at 10:24 AM, Tim Prouty wrote:

> Hi,
>
> This is my first time sending a patch to the list, so let me know if  
> there
> are any conventions I missed.
>
> Attached is a patch that attempts to remove the data structure
> limitations on the number of open file descriptors in the system.  The
> patch is against our modified version of FreeBSD 7, so it probably
> won't apply cleanly against upstream, but I wanted to get this out
> there for discussion soon so if there is feedback, we can address it
> and then worry about porting a specific patch for upstream.
>
> We (Isilon) have been running this internally for a few months without
> any issues, although there is at least one known issue that I need to
> resolve, which is mentioned below.
>
> Motivation:
>
>  With the increasing amount of memory and processing power in modern
>  machines, there are certain userspace processes that are able to
>  handle much higher concurrent load than previously possible.  A
>  specific example is a single-process/multi-threaded SMB stack which
>  can handle thousands of connected clients, each with hundreds of
>  files open.  Once kernel sysctl limits are increased for max files,
>  the next limitation is in the actual actual file descriptor data
>  structures.
>
> Problem - Data Structure Limits:
>
>  The existing per-process data structures for the file descriptor are
>  flat tables, which are reallocated each time they need need to grow.
>  This is innefficient as the amount of data to allocate and copy each
>  time increases, but the bigger issue is the potentially limited
>  amount of contiguous KVA memory as the table grows very large.  Over
>  time as the KVA memory becomes fragmanted, malloc may be unable to
>  provide large enough blocks of contiguous memory.
>
>  In the current code the struct proc contains both an array of struct
>  file pointers and a bit field indicating which file descriptors are
>  in use.  The primary issue is how to handle these structures growing
>  beyond the kernel page size of 4K.
>
>  The array of file pointers will grow much faster than the bit field,
>  especially on a 64 bit kernel. The 4K block size will be hit at 512
>  files (64 bit kernel) for the file pointer array and 32,768 files
>  for the bit field.
>
> Solution:
>
> File Pointer Array
>
>  Focusing first on the file pointer array limitation, an indirect
>  block approach is used.  An indirect block size of 4K (equal to page
>  size) is used, allowing for 512 files per block.  To optimize for
>  the common case of low/normal fd usage, a flat array is initialized
>  to 20 entries and then grows at 2x each time until the block reaches
>  it's maximum size. Once more than 512 files are opened, the array
>  will transition to a single level indirect block table.
>
> Fd Bitfield:
>
>  The fd bit field as it stands can represent 32K files when it grows
>  to the page size limit.  Using the same indirect system as the file
>  pointer array, it is able to grow beyond it's existing limits.
>
> Close Exec Field:
>
>  One complication of the old file pointer table is that for each file
>  pointer there was 1 byte flags.  The memory was laid out such that
>  the file pointers are all in one contiguous array, followed by a
>  second array of chars where each char entry is a flags field that
>  corresponds to the file pointer at the same index.  Interestingly
>  there is actually only one flag that is used: UF_EXCLOSE, so it's
>  fairly wasteful to have an array of chars.  What linux does, and
>  what I have done is to just use a bitfield for all fds that should
>  be closed on exec.  This could be further optimized by doing some
>  pointer trickery to store the close exec bit in the struct file
>  pointer rather than keep a separate bitfield.
>
> Indirect Block Table:
>
>  Since there are three consumers of the indirect block table, I
>  generalized it so all of the consumers rely on the same code.  This
>  could eventually be refactored into a kernel library since it could
>  be generally useful in other areas.  The table uses a single level
>  of indirection, so the base table can still grow beyond the 4K.  As
>  a process uses more fds, the need to continue growing the base table
>  should be fairly limited, and a single realloc will significantly
>  increase the number of fds the process can allocate.
>
> Accessing the new data structures:
>
>  All consumers of the file pointer array and bitfield will now have
>  to use accessors rather than using direct access.
>
> Known Issues:
>
>  The new fdp locking in fdcopy needs to be reworked.
>
>
> Thank you for reviewing!
>
> -Tim
>
> <0001-Increase-scalabilty-of-per-process-file-descriptor-d.patch>
> _______________________________________________
> freebsd-arch at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch- 
> unsubscribe at freebsd.org"



More information about the freebsd-arch mailing list