netmap wishlist

Luigi Rizzo rizzo at iet.unipi.it
Fri Sep 12 07:31:22 UTC 2014


On Fri, Sep 12, 2014 at 7:59 AM, Eggert, Lars <lars at netapp.com> wrote:

> Hi Luigi,
>
> I've started to play with netmap, like it a lot, and would like it to grow
> support for some additional features that I'd need. I wonder if you could
> comment on how likely support for any of the following is in netmap in the
> foreseeable future?
>
> * IP/TCP/UDP checksum offload
> * TCP/UDP segmentation offload
> * TCP/UDP large receive offload
> * jumbograms (I saw the email earlier today, so maybe that's addressed)
>

​Hi Lars:

there is something already available/in progress for some of the above,
but here are my thoughts on the various subjects:

- netmap is designed to work with large frames, by setting the buffer
  size to something suitable (using a sysctl).
  There might be some lurking bugs (e.g. some NICs need to be told
  about the maximum frame size or they will refuse to send/receive them
  even though the slot in the NIC ring specifies a large buffer),
  but this is trivial to fix on a case by case basis.
    The downside is some waste on buffers (they are fixed size so having
  to allocate say 16K for a 64 byte frame is a bit annoying).

- checksums offloading can be added trivially in the *_txsync(),
  once again on a per-nic basis.
  Problem is, is we start adding per-packet features (say, checksums,
  scatter-gather I/O, segmentation) in the inner loop of *_txsync()
  we are going to lose some performance for high rate applications.
  Now we are running at about 20ns/pkt (because we assume a flat
  data format), having a few extra conditionals in the inner loop
  could easily eat another 5..20ns/pkt, and this makes me a bit
  uncomfortable, especially because the situations where these offloadings
  matter are typically with large packets, where we are not CPU bound.

- the VALE switch has support for segmentation and checksum avoidance.
  Clients can register as virtio-net capable: in this case the port will
  accept/deliver large segments across that port, and do segmentation and
  checksum as required for ports that are not virtio-net enabled
  (e.g. physical NICs attached to the same VALE switch).
  This was developed earlier this year by Vincenzo Maffione.

  At the moment this only works on top of VALE ports, not NICs,
  and the reason is that there is a big win if the VM can deliver
  a large segments in one shot to another local VM. Much less useful
  if you are talking across a physical device, in which case the OS
  should be able to do a reasonable job in segmenting packets
  (see also next item).

  We could probably leverage this code to work also on top of NICs
  connected through netmap, e.g. programming the NIC to use its own
  native offloading, but i am skeptical about the usefulness and
  concerned about the potential performance loss in *_txsync().

- Stefano Garzarella has some code to do software GSO (this is for FreeBSD,
  linux already has something similar), which will be presented at
  EuroBSDCon later this month in Sofia. This should address the
  segmentation issue on the host stack.

- on the receive side, both FreeBSD and Linux have an efficient
  RLO software fallback in case the NIC does not support it
  natively, i think we do not need this at the NIC/switch level.

cheers
luigi


More information about the freebsd-net mailing list