Network TX/RX fairness is not honored by USB stack

Mon Oct 4 19:30:57 UTC 2010

On Sun, Oct 03, 2010 at 08:33:39PM -0700, Julian Elischer wrote:
>  On 10/3/10 4:54 PM, Pyun YongHyeon wrote:
> >On Sat, Oct 02, 2010 at 08:41:57AM +0200, Hans Petter Selasky wrote:
> >>On Saturday 02 October 2010 02:11:00 Pyun YongHyeon wrote:
> >>>Hi,
> >>>
> >>>I don't know how long it had been there but it seems current USB
> >>>stack does not honor fairness of TX/RX on USB ethernet controller.
> >>>Unidirectional performance test(UDP) or most-unidirectional
> >>>performance(TCP) test works well without problems. However if heavy
> >>>TX/RX traffic hits controller at the same time either TX or RX is
> >>>not served at all. I'm under the impression that whenever TX work
> >>>is done it seems USB reschedules next pending TX again instead of
> >>>processing RX such that RX is starved to death. This can be easily
> >>>reproduced on two hosts with the netperf performance test.
> >>>Whenever both hosts send tiny UDP datagrams to the other host
> >>>either TX or RX packet counters are not increasing until the end
> >>>of the UDP torture test. The number of EHCI interrupt is about 8K/s
> >>>while test is in progress so I think it reached its maximum
> >>>processing limit. After netperf testing, it can still process TX/RX
> >>>packets even though it dropped too many RX packets. But these
> >>>dropped packets are not counted so netstat(1) shows 0 dropped
> >>>frames even though it lost millions of packets.
> >>>
> >>>Hans, do you have any idea what's going on here?
> >>>You can use the following netperf command on both hosts after
> >>>running netserver.
> >>>%netperf -c -H ip_addr_of_other_host -tUDP_STREAM -l 300 -- -m 1
> >>>
> >>>Another odd thing I noticed is number of interrupts does not go
> >>>down to 0 after the testing. It constantly generates 1k/s
> >>>interrupts after that.
> >>Maybe we are triggering a bug. Can you enable USB debugging to figure out 
> >>what
> >>data lengths are transmitted or received.
> >>
> >In the middle of testing? If yes, that would be meaningless as it
> >would generate bunch of messages. The test case generates payload
> >size 1 UDP datagrams with full speed so enabling debug messages
> >will change timing. Note, I'm exercising number of packets per
> >second, not number of bytes per second.
> >
> >>USB EHCI uses round robin, so this is either USB device problem or a test-
> >>program software failure.
> >>
> >I'm pretty sure the benchmark program is not broken, so either
> >axe(4) or USB stack could be wrong here. I see three issues from
> >the UDP torture test.
> >  - Either TX or RX could be starved to death. If you start TX test
> >    first, RX would be stuck. If you start RX test first, TX would
> >    be stuck.
> >  - The number of packets sent or received are much lower than
> >    expected.
> >    For TX case, the number of packets sent per second is exactly 8k
> >    which is much less than that of non-USB controllers. For gigabit
> 
> that is a big clue.
> the USB hardware uses an 8 thousand time per second clock for it's 
> internal polling.
> 
> >    controllers number of TX packets could be several hundred
> >    thousands per second. For RX packets it shows 14K/s packets with
> >    8K/s interrupts. I thought USB ethernet controllers can send
> >    more than 8k packets per second. Because the number of
> >    interrupts per second and 8k packets per second is the same,
> >    this also make me wonder there could be some relations there.
> >  - Number of interrupts does not go back to 0 after the testing.
> >
> >I'll let you know if I find some clue but it may take long time as
> >I'm not familiar with USB stack. :-(
> >

Attached patch address two issues for me. Because axe(4) combines
multiple TX into single one, relying on TX completion callback to
update TX packet counter was wrong. I moved that counting into TX
loop which addressed the issue. Now TX counter shows 221Kpps which
looks good to me. I don't think it's correct way to count TX
packets as the counter was updated before TX but it seems there is
no way to know how many packets are sent at the write callback.

The 3rd issue seems to come from setting up interrupt endpoint. It
seems controller constantly generates interrupt even if no traffic
is there. I was able to decrease number of interrupts generated by
setting interval to 1sec which effectively limits number of
interrupts generated to 1. However I don't see the point here, why
axe(4) needs to setup interrupt endpoint given that axe(4) does
nothing in interrupt callback. The callback does not even try to
read reported data. The interrupt endpoint carries 8 bytes of data
and it contains link UP/DOWN information as well as PHY related
data. Because axe(4) relies on mii(4) polling to detect link state
changes I see no point to use interrupt endpoint here. Old axe(4)
didn't use interrupt endpoint at all, so what is the reason to
setup interrupt endpoint given that it's not used at all?
Personally I'd like to nuke the use of interrupt endpoint here but
want to hear Han's opinion before doing that.