Changes in the network interface queueing handoff model

Max Laier max at love2party.net
Sun Jul 30 14:59:19 UTC 2006


On Sunday 30 July 2006 16:04, Robert Watson wrote:
> One of the ideas that I, Scott Long, and a few others have been bouncing
> around for some time is a restructuring of the network interface packet
> transmission API to reduce the number of locking operations and allow
> network device drivers increased control of the queueing behavior.  Right
> now, it works something like that following:
>
> - When a network protocol wants to transmit, it calls the ifnet's link
> layer output routine via ifp->if_output() with the ifnet pointer, packet,
> destination address information, and route information.
>
> - The link layer (e.g., ether_output() + ether_output_frame()) encapsulates
>    the packet as necessary, performs a link layer address translation (such
> as ARP), and hands off to the ifnet driver via a call to IFQ_HANDOFF(),
> which accepts the ifnet pointer and packet.
>
> - The ifnet layer enqueues the packet in the ifnet send queue
> (ifp->if_snd), and then looks at the driver's IFF_DRV_OACTIVE flag to
> determine if it needs to "start" output by the driver.  If the driver is
> already active, it doesn't, and otherwise, it does.
>
> - The driver dequeues the packet from ifp->if_snd, performs any driver
>    encapsulation and wrapping, and notifies the hardware.  In modern
> hardware, this consists of hooking the data of the packet up to the
> descriptor ring and notifying the hardware to pick it up via DMA.  In order
> hardware, the driver would perform a series of I/O operations to send the
> entire packet directly to the card via a system bus.
>
> Why change this?  A few reasons:
>
> - The ifnet layer send queue is becoming decreasingly useful over time. 
> Most modern hardware has a significant number of slots in its transmit
> descriptor ring, tuned for the performance of the hardware, etc, which is
> the effective transmit queue in practice.  The additional queue depth
> doesn't increase throughput substantially (if at all) but does consume
> memory.
>
> - On extremely fast hardware (with respect to CPU speed), the queue remains
>    essentially empty, so we pay the cost of enqueueing and dequeuing a
> packet from an empty queue.
>
> - The ifnet send queue is a separately locked object from the device
> driver, meaning that for a single enqueue/dequeue pair, we pay an extra
> four lock operations (two for insert, two for remove) per packet.
>
> - For synthetic link layer drivers, such as if_vlan, which have no need for
>    queueing at all, the cost of queueing is eliminated.
>
> - IFF_DRV_OACTIVE is no longer inspected by the link layer, only by the
>    driver, which helps eliminate a latent race condition involving use of
> the flag.
>
> The proposed change is simple: right now one or more enqueue operations
> occurs, when a call to ifp->if_start() is made to notify the driver that it
> may need to do something (if the ACTIVE flag isn't set).  In the new world
> order, the driver is directly passed the mbuf, and may then choose to queue
> it or otherwise handle it as it sees fit.  The immediate practical benefit
> is clear: if the queueing at the ifnet layer is unnecessary, it is entirely
> avoided, skipping enqueue, dequeue, and four mutex operations.  This
> applies immediately for VLAN processing, but also means that for modern
> gigabit cards, the hardware queue (which will be used anyway) is the only
> queue necessary.
>
> There are a few downsides, of course:
>
> - For older hardware without its own queueing, the queue is still required
> -- not only that, but we've now introduced an unconditional function
> pointer invocation, which on older hardware, is has more significant
> relative cost than it has on more recent CPUs.
>
> - If drivers still require or use a queue, they must now synchronize access
> to the queue.  The obvious choices are to use the ifq lock (and restore the
> above four lock operations), or to use the driver mutex (and risk higher
> contention).  Right now, if the driver is busy (driver mutex held) then an
> enqueue is still possible, but with this change and a single mutex
> protecting the send queue and driver, that is no longer possible.
>
> Attached is a patch that maintains the current if_start, but adds
> if_startmbuf.  If a device driver implements if_startmbuf and the global
> sysctl net.startmbuf_enabled is set to 1, then the if_startmbuf path in the
> driver will be used.  Otherwise, if_start is used.  I have modified the
> if_em driver to implement if_startmbuf also.  If there is no packet backlog
> in the if_snd queue, it directly places the packet in the transmit
> descriptor ring. If there is a backlog, it uses the if_snd queue protected
> by driver mutex, rather than a separate ifq mutex.
>
> In some basic local micro-benchmarks, I saw a 5% improvement in UDP 0-byte
> paylod PPS on UP, and a 10% improvement on SMP.  I saw a 1.7% performance
> improvement in the bulk serving of 1k files over HTTP.  These are only
> micro-benchmarks, and reflect a configuration in which the CPU is unable to
> keep up with the output rate of the 1gbps ethernet card in the device, so
> reductions in host CPU usage are immediately visible in increased output as
> the CPU is able to better keep up with the network hardware.  Other
> configurations are also of interest of interesting, especially ones in
> which the network device is unable to keep up with the CPU, resulting in
> more queueing.
>
> Conceptual review as well as banchmarking, etc, would be most welcome.

This begs the question: What about ALTQ?

If we maintain the fallback mechanism in _handoff, we can just add 
ALTQ_IS_ENABLED() to the test.  Otherwise every driver's startmbuf function 
would have to take care of ALTQ itself, which is not preferable.

I strongly agree with you comment about how messed up ifq_*/if_* in if_var.h 
are - and I'm afraid that's partly me fault for bringing in ALTQ.

-- 
/"\  Best regards,                      | mlaier at freebsd.org
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mlaier at EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20060730/407b6d9d/attachment.pgp


More information about the freebsd-arch mailing list