HEADS UP: zerocopy bpf commits impending
Robert Watson
rwatson at FreeBSD.org
Mon Mar 17 11:45:53 PDT 2008
On Mon, 17 Mar 2008, Julian Elischer wrote:
>> Per previous posts, interested parties can find the slides on the design
>> from the BSDCan 2008 developer summit here:
>>
>>
>> http://www.watson.org/~robert/freebsd/2007bsdcan/20070517-devsummit-zerocopybpf.pdf
>
> with the video of the talk at:
>
> http://www.freebsd.org/~julian/BSDCan-2007/rwatson_bpf.mov
The primary design change since that time is that we've eliminated the
ioctl-driven monitoring and ACKing of shared memory buffers from userspace.
All shared memory consumers must use the shared memory ACK model, and our
libpcap changes do that. This removes redundancy (and complexity) from the
set of ioctls we've added. I've attached the (new) text from bpf.4 below,
which I think captures the changes best.
Robert N M Watson
Computer Laboratory
University of Cambridge
BUFFER MODES
bpf devices deliver packet data to the application via memory buffers
provided by the application. The buffer mode is set using the
BIOCSETBUFMODE ioctl, and read using the BIOCGETBUFMODE ioctl.
Buffered read mode
By default, bpf devices operate in the BPF_BUFMODE_BUFFER mode, in which
packet data is copied explicitly from the kernel to user memory using the
read(2) system call. The user process will declare a fixed buffer size
that will be used both for sizing internal buffers and for all read(2)
operations on the file. This size is queried using the BIOCGBLEN ioctl,
and is set using the BIOCSBLEN ioctl. Note that an individual packet
larger than the buffer size is necessarily truncated.
Zeroâcopy buffer mode
bpf devices may also operate in the BPF_BUFMODE_ZEROCOPY mode, in which
packet data is written directly into user memory buffers by the kernel,
avoiding both system call and copying overhead. Buffers are of fixed
(and equal) size, pageâaligned, and an even multiple of the page size.
The maximum zeroâcopy buffer size is returned by the BIOCGETZMAX ioctl.
Note that an individual packet larger than the buffer size is necessarily
truncated.
The user process registers two memory buffers using the BIOCSETZBUF
ioctl, which accepts a struct bpf_zbuf pointer as an argument:
struct bpf_zbuf {
void *bz_bufa;
void *bz_bufb;
size_t bz_buflen;
};
bz_bufa is a pointer to the userspace address of the first buffer that
will be filled, and bz_bufb is a pointer to the second buffer. bpf will
then cycle between the two buffers starting with bz_bufa.
Each buffer begins with a fixedâlength header to hold synchronization
and
data length information for the buffer:
struct bpf_zbuf_header {
volatile u_int bzh_kernel_gen; /* Kernel generation number. */
volatile u_int bzh_kernel_len; /* Length of data in the buffer.
*/
volatile u_int bzh_user_gen; /* User generation number. */
/* ...padding for future use... */
};
The header structure of each buffer, including all padding, should be
zeroed before it is passed to the ioctl. Remaining space in the buffer
will be used by the kernel to store packet data, laid out in the same
format as with buffered read mode.
The kernel and the user process follow a simple acknowledgement protocol
via the buffer header to synchronize access to the buffer: when the
header generation numbers, bzh_kernel_gen and bzh_user_gen, hold the same
value, the kernel owns the buffer, and when they differ, userspace owns
the buffer.
While the kernel owns the buffer, the contents are unstable and may
change asynchronously; while the user process owns the buffer, its conâ
tents are stable and will not be changed until the buffer has been
acknowledged.
Initializing the buffer headers to all 0âs before registering the
buffer
has the effect of assigning initial ownership of both buffers to the
kerâ
nel. The kernel signals that a buffer has been assigned to userspace by
modifying bzh_kernel_gen, and userspace acknowledges the buffer and
returns it to the kernel by setting the value of bzh_user_gen to the
value of bzh_kernel_gen.
In order to avoid caching and memory reâordering effects, the user
process must use atomic operations and memory barriers when checking for
and acknowledging buffers:
#include <machine/atomic.h>
/*
* Return ownership of a buffer to the kernel for reuse.
*/
static void
buffer_acknowledge(struct bpf_zbuf_header *bzh)
{
atomic_store_rel_int(&bzhâ>bzh_user_gen,
bzhâ>bzh_kernel_gen);
}
/*
* Check whether a buffer has been assigned to userspace by the kernel.
* Return true if userspace owns the buffer, and false otherwise.
*/
static int
buffer_check(struct bpf_zbuf_header *bzh)
{
return (bzhâ>bzh_user_gen !=
atomic_load_acq_int(&bzhâ>bzh_kernel_gen));
}
The user process may force the assignment of the next buffer, if any data
is pending, to userspace using the BIOCROTZBUF ioctl. This allows the
user process to retrieve data in a partially filled buffer before the
buffer is full, such as following a timeout; the process must check for
buffer ownership using the header generation numbers, as the buffer will
not be assigned if no data was present.
As in the buffered read mode, kqueue(2), poll(2), and select(2) may be
used to sleep awaiting the availbility of a completed buffer. They will
return a readable file descriptor when ownership of the next buffer is
assigned to user space.
In the current implementation, the kernel will assign ownership of at
most one buffer at a time to the user process. The user processes must
acknowledge the current buffer in order to be notified that the next
buffer is ready for processing. Programs should not rely on this as an
invariant, as it may change in future versions.
More information about the freebsd-current
mailing list