em(4) hang [Was: Re: igb(4) won't start with "igb0: Could not setup receive structures"]

Jack Vogel jfvogel at gmail.com
Thu Mar 31 21:57:57 UTC 2011


So, what is the evidence that the driver is stuck here?

I see that next_to_check != next_to_refresh, which is why the
local timer won't schedule anything. OH, and I also realized there
is a problem with local_timer anyway, it will run rxeof, but that won't help
if you can't enter the loop, so I need to add some code at the top to
call em_refresh_mbufs() when in this state.

On this interrupt cause that you are focused upon, although its there in the
design, I had talked with some of our most seasoned developers on both
the Windows and Linux side of the house, and NO one has ever used this
'feature', because (and I'm quoting here) "there's no good use case for it".
Meaning, there's always some simpler way of handling the issue.

When you use MSIX you can't read causes btw, if you configured it, it would
mean you'd just get into the regular RX handler, same as always, so why
some special bother with this cause?

On non-MSIX hardware there is just no particular reason to worry about the
cause either, we can just handle the RX situation in the interrupt handler.

Jack


On Thu, Mar 31, 2011 at 2:09 PM, Arnaud Lacombe <lacombar at gmail.com> wrote:

> Hi Jack,
>
> On Thu, Mar 31, 2011 at 9:51 AM, Arnaud Lacombe <lacombar at gmail.com>
> wrote:
> > [...]
> > I'll remove part of the changes I made to keep only `rx_forced_refill'
> > and the associated sysctl, re-run the tests and come back with correct
> > value, hopefully in a few hours.
> >
> Here it is:
>
> # sysctl dev.em.0.%desc
> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.2
>
> # sysctl dev.em.0.mac_stats.missed_packets
> dev.em.0.mac_stats.missed_packets: 917428
>
> # sysctl dev.em.0.debug=1
> dev.em.0.debug: I-1nterface is RUNNING and INACTIVE
> em0: hw tdh = 975, hw tdt = 975
> em0: hw rdh = 884, hw rdt = 885
> em0: Tx Queue Status = 0
> em0: TX descriptors avail = 1024
> em0: Tx Descriptors avail failure = 0
> em0: RX discarded packets = 0
> em0: RX Next to Check = 884
> em0: RX Next to Refresh = 885
>  -> -1
>
> So the taskqueue cannot be scheduled to run and the driver is stuck.
>
> > On Wed, Mar 30, 2011 at 2:22 PM, Jack Vogel <jfvogel at gmail.com> wrote:
> >> Read the code in HEAD, em_local_timer() has a test of ALL the rx queues
> and
> >> will schedule a task that refreshes mbufs if they are empty. This has
> >> exactly the
> >> same effect as checking for some interrupt cause, a cause that is not
> >> available
> >> when using MSIX on 82574, but this approach works for everything.
> >>
> Can you please point me to a reference datasheet (or errata), provided
> by Intel, about the RX Overrun interrupt not being available with
> MSI-X on the 82574 ?
>
> Currently, I only have access to [0], which precises the following:
>
> 7.4 Interrupts
> 7.4.2 MSI-X Mode
> [...]
> The following configuration and parameters are involved:
> • The IVAR.INT_Alloc[4:0] entries map two Tx queues, two Rx queues and
> other
> events to 5 interrupt vectors
> • The ICR[24:20] bits reflect specific interrupt causes
> • Five MSI-X interrupt vectors are provided (calculated based on four
> vectors for
> queues and one vector for other causes). The requested number of vectors is
> loaded from the MSI_X_N fields in the EEPROM into the PCIe MSI-X capability
> structure of the function.
>
> 10.2.4.1 Interrupt Cause Read Register - ICR (0x000C0; RC/WC)
> [...]
>
> about bit 24:
>
> Other Interrupt. Indicates one of the following interrupts was set:
> • Link Status Change.
> • Receiver Overrun.
> • MDIO Access Complete.
> • Small Receive Packet Detected.
> • Receive ACK Frame Detected.
> • Manageability Event Detected.
>
> Thanks in advance,
>  - Arnaud
>
> [0]: ftp://download.intel.com/design/network/datashts/82574.pdf
>


More information about the freebsd-net mailing list