Support for zero copy sockets
Navdeep Parhar
np at FreeBSD.org
Tue Aug 12 01:52:40 UTC 2014
On 08/11/14 17:42, Adrian Chadd wrote:
> On 11 August 2014 12:11, Navdeep Parhar <np at freebsd.org> wrote:
>> There is zero copy receive (aka Direct Data Placement -- DDP) in the TOE
>> driver that accompanies cxgbe(4). I have a tx zero copy implementation
>> for it as well (this is not in -current right now). But all this code
>> is chip specific and applies only to TCP connections that are handled
>> by the TOE driver. It doesn't rely on COW or page flipping.
>>
>> The reason I'm mentioning all of this here is that if anyone is thinking
>> of working on proper zero copy awareness (and APIs) at the socket layer
>> then count me in as an interested party.
>
> I'm not going to get into it just for now, as I have enough on my
> FreeBSD plate to do already.
I'm in the same situation.
>
> However, the thing that always irked me about the hardware based
> solutions is that they're great for a subset of problems - typically
> small sets of sockets. The real interesting problem for me is how to
> make it work for say, 500,000 or more concurrent TCP sessions.
The hardware based solutions that I'm familiar with can handle tens of
thousands of TCP sockets concurrently. The protocol processing is
entirely on the chip and when DDP is active the chip can DMA the payload
straight to its final destination -- typically a userspace buffer. The
only VM operation involved is wiring and then unwiring the uio.
The complication is that the driver (cxgbe's t4_tom in this case) has
absolutely no idea what an application does (blocking read vs.
poll/select+read vs. aio_read vs. ...) so it makes some safe but
suboptimal choices. It would be nice if there were an API (very vaguely
along the lines of madvise but for sockets, or maybe a sockopt knob)
that an application could use to provide hints about its behavior. We
could also do with separate zero-copy flavors of the sosend/soreceive
usrreqs. And more hints (per read/write operation) that might let us
avoid even the wire/unwire operation.
Anyway, let's save this discussion for later, when either of us has the
time to come up with a specific set of proposals for -net and -arch.
Regards,
Navdeep
>
> I can see a method of doing zero-copy writes to the network stack -
> look at what the AIO code does in the physical IO path for doing
> writes. It wires down the memory and stuffs it into the buffer.
>
> The thing I haven't yet sorted out is what to do about mappings in
> case kernel code wants to peek at the socket data payload for whatever
> reason.
>
> (And yes, reads are still a problem.)
>
>
>
> -a
>
More information about the freebsd-hackers
mailing list