[PATCH]/[RFC] Increase scalability of per-process file
descriptor data structures
Tim Prouty
tim.prouty at isilon.com
Tue May 11 23:13:47 UTC 2010
The patch was slightly truncated, I'm guessing because it was > 50K.
Attached is a slightly trimmed down patch.
-Tim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fd_scalability.patch
Type: application/octet-stream
Size: 44180 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20100511/34841690/fd_scalability-0001.obj
-------------- next part --------------
On May 11, 2010, at 10:24 AM, Tim Prouty wrote:
> Hi,
>
> This is my first time sending a patch to the list, so let me know if
> there
> are any conventions I missed.
>
> Attached is a patch that attempts to remove the data structure
> limitations on the number of open file descriptors in the system. The
> patch is against our modified version of FreeBSD 7, so it probably
> won't apply cleanly against upstream, but I wanted to get this out
> there for discussion soon so if there is feedback, we can address it
> and then worry about porting a specific patch for upstream.
>
> We (Isilon) have been running this internally for a few months without
> any issues, although there is at least one known issue that I need to
> resolve, which is mentioned below.
>
> Motivation:
>
> With the increasing amount of memory and processing power in modern
> machines, there are certain userspace processes that are able to
> handle much higher concurrent load than previously possible. A
> specific example is a single-process/multi-threaded SMB stack which
> can handle thousands of connected clients, each with hundreds of
> files open. Once kernel sysctl limits are increased for max files,
> the next limitation is in the actual actual file descriptor data
> structures.
>
> Problem - Data Structure Limits:
>
> The existing per-process data structures for the file descriptor are
> flat tables, which are reallocated each time they need need to grow.
> This is innefficient as the amount of data to allocate and copy each
> time increases, but the bigger issue is the potentially limited
> amount of contiguous KVA memory as the table grows very large. Over
> time as the KVA memory becomes fragmanted, malloc may be unable to
> provide large enough blocks of contiguous memory.
>
> In the current code the struct proc contains both an array of struct
> file pointers and a bit field indicating which file descriptors are
> in use. The primary issue is how to handle these structures growing
> beyond the kernel page size of 4K.
>
> The array of file pointers will grow much faster than the bit field,
> especially on a 64 bit kernel. The 4K block size will be hit at 512
> files (64 bit kernel) for the file pointer array and 32,768 files
> for the bit field.
>
> Solution:
>
> File Pointer Array
>
> Focusing first on the file pointer array limitation, an indirect
> block approach is used. An indirect block size of 4K (equal to page
> size) is used, allowing for 512 files per block. To optimize for
> the common case of low/normal fd usage, a flat array is initialized
> to 20 entries and then grows at 2x each time until the block reaches
> it's maximum size. Once more than 512 files are opened, the array
> will transition to a single level indirect block table.
>
> Fd Bitfield:
>
> The fd bit field as it stands can represent 32K files when it grows
> to the page size limit. Using the same indirect system as the file
> pointer array, it is able to grow beyond it's existing limits.
>
> Close Exec Field:
>
> One complication of the old file pointer table is that for each file
> pointer there was 1 byte flags. The memory was laid out such that
> the file pointers are all in one contiguous array, followed by a
> second array of chars where each char entry is a flags field that
> corresponds to the file pointer at the same index. Interestingly
> there is actually only one flag that is used: UF_EXCLOSE, so it's
> fairly wasteful to have an array of chars. What linux does, and
> what I have done is to just use a bitfield for all fds that should
> be closed on exec. This could be further optimized by doing some
> pointer trickery to store the close exec bit in the struct file
> pointer rather than keep a separate bitfield.
>
> Indirect Block Table:
>
> Since there are three consumers of the indirect block table, I
> generalized it so all of the consumers rely on the same code. This
> could eventually be refactored into a kernel library since it could
> be generally useful in other areas. The table uses a single level
> of indirection, so the base table can still grow beyond the 4K. As
> a process uses more fds, the need to continue growing the base table
> should be fairly limited, and a single realloc will significantly
> increase the number of fds the process can allocate.
>
> Accessing the new data structures:
>
> All consumers of the file pointer array and bitfield will now have
> to use accessors rather than using direct access.
>
> Known Issues:
>
> The new fdp locking in fdcopy needs to be reworked.
>
>
> Thank you for reviewing!
>
> -Tim
>
> <0001-Increase-scalabilty-of-per-process-file-descriptor-d.patch>
> _______________________________________________
> freebsd-arch at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-
> unsubscribe at freebsd.org"
More information about the freebsd-arch
mailing list