re(4) driver dropping packets when reading NFS files

Pyun YongHyeon pyunyh at gmail.com
Mon Nov 1 23:40:11 UTC 2010


On Mon, Nov 01, 2010 at 06:18:13PM -0400, Rick Macklem wrote:
> > On Sun, Oct 31, 2010 at 05:46:57PM -0400, Rick Macklem wrote:
> > > I recently purchased a laptop that has a re(4) Realtek
> > > 8101E/8102E/8103E net
> > > chip in it and I find that it is dropping packets like crazy when
> > > reading
> > > files over an NFS mount. (It seems that bursts of receive traffic
> > > cause it,
> > > since when I look over wireshark, typically the 2nd packet in a read
> > > reply
> > > is not received, although it was sent at the other end.)
> > >
> > 
> > Are you using NFS over UDP?
> 
> The test I referred to was over TCP, which works fine until reading a
> file and then about the second TCP data segment that is sent by the
> server isn't received by the client with the re(4). (I had tcpdump
> capturing on both machines and then compared them using wireshark.)


Hmm, this is not what I initially thought. re(4) has much more
preference on handling RX traffic over TX so it should have
received that even though it couldn't send response packets. I'm
not sure my patch can mitigate the issue.

> The read does progress slowly, after TCP retransmits the segment that
> was dropped. The result is a rate of about 10 reads/sec.
> 
> A test over UDP gets nowhere. You just gets lots of
>   "IP fragments timed out" when you "netstat -s", so it seems to
>   consistently drop a fragment in the read UDP reply.
> > 
> > > Adding "options DEVICE_POLLING" helps a lot. (ie. order of magnitude
> > > faster
> > > reading) Does this hint that interrupts are being lost or delayed
> > > too much?
> > >
> > 
> > Actually I'm not a fan of polling(4) but re(4) controllers might be
> > exceptional one due to controller limitation but order of magnitude
> > faster indicates something is wrong in driver.
> > 
> 
> Yep, I'd agree. I can print out the exact chip device info, but if you
> don't have data sheets, it may not help. It seems to be a low end chip,
> since it doesn't support 1Gbps --> closer to an 8139. It might be
> called an 8036, since that # shows up in the device resources under
> windoze.
> 
> > 
> > AFAIK re(4) controllers lacks interrupts moderation so re(4) used
> > to rely on taskqueue to reduce number of interrupts. It was written
> > long time ago by Bill and I'm not sure whether it's still valid for
> > recent PCIe RealTek controllers. One of problem is getting
> > stand-alone PCIe controllers in market and I was not able to buy
> > recent controllers. This is one of reason why re(4) still lacks TSO,
> > jumbo frame and 64bit DMA support for newer controllers. Another
> > problem is RealTek no longer releases data sheet so it's hard to
> > write new features that may present on recent controllers.
> > 
> > Recent re(4) controllers started to support small set of hardware
> > MAC statistics counters and that may help to understand how many
> > frames were lost under heavy load. I'll let you know when I have a
> > patch for that. Flow-control may also enhance performance a little
> > bit but it was not implemented yet like most other consumer grade
> > ethernet drivers. But this may change in near future, marius@ is
> > actively working on this so we'll get generic flow-control
> > framework in tree.
> 
> It drops a frame as soon as the read starts and there is a burst
> of more than one. (I can email you the tcpdump captures if you're
> interested and you won't have to look far into it to see it happen.)
> 

I'm more interested in number of dropped frames. See below how to
extract that information.

> It seems to do it consistently and then recovers when the TCP
> segment is resent, but repeats the fun on the next one.
> (I'm wondering if it can't support a 64 entry receive ring. I'll
>  try making it smaller and see what happens? Probably won't help,
>  but can't hurt to try:-)
> 
> > 
> > I'll see what can be done in interrupt handler and I'll let you
> > know when patch is ready.
> > 
> > > Thanks, rick
> > > ps: This laptop is running a low end AMD cpu and I did install amd64
> > > on it,
> > >     instead of i386, in case that might be relevent?
> > 
> > I don't think so.
> > 
> Ok. I didn't think so, but someone recently mentioned that some drivers
> for wifi chips don't work for amd64.
> 

All drivers touched by me should work on any architectures. The
code is the same so there is no difference.

> It actually works fairly well (and quite well with DEVICE_POLLING), except
> for this issue where it drops received packets when it gets bursts of them.

Actually this is one of advantage of using interrupts against
polling. Interrupts tend to give more fast response. To achieve the
similar behavior with polling you should have used high hz. Your
test indicates quite opposite result though.

> (It almost looks like it only handles the first received packet, although
>  it appears to be using a receive ring of 64 buffers.)
> 

No, re(4) uses 256 TX/RX buffers for RTL810xE controllers.

> Anyhow, I'll keep poking at it and will appreciate any patches/suggestions
> that you might have.
> 

Ok, here is patch.
http://people.freebsd.org/~yongari/re/re.intr.patch

The patch has the following changes.
 o 64bit DMA support for PCIe controllers.
 o Hardware MAC statistics counter support. You can extract these
   counters with "sysctl dev.re.0.stats=1". You can check the
   output on console or dmesg. It seems extracting these counters
   take a lot of time so I didn't try to accumulate the counters.
   You can see how many frames are dropped from the output. I saw a
   lot FAE(frame alignment errors) under high RX load and I can't
   explain how this can happen. This may indicate PHY hardware is
   poor or it may need DSP fixups. Realtek seems to maintain large
   set of DSP fixups for each PHY revisions and re(4) does not
   have the magic code at this moment.
 o Overhaul MSI interrupt handler such that make it give fairness
   to TX as well as serving RX. Because re(4) controllers do not
   have interrupt moderation mechanism, naive interrupt handler can
   generate more than 125k intrs/sec under high load. Fortunately,
   Bill implemented TX interrupt moderation with a timer register
   and it seems to work well on TX path. One drawback of the
   approach is it will require extra timer register accesses in
   fast path. There is no second timer register to use in RX path
   so no RX interrupt moderation is done in driver such that it can
   generate about 25k intrs/sec under high RX load. However, I
   think most systems can handle that interrupt load.  Note, this
   feature is activated only when MSI is in use and DEVICE_POLLING
   is not defined.

>From my limited testing, it seems it works as expected. Would you
give it try and let me know how well it behaves with NFS?

> Thanks, rick


More information about the freebsd-current mailing list