em(4) hang [Was: Re: igb(4) won't start with "igb0: Could not setup receive structures"]

Thu Mar 31 23:38:09 UTC 2011

My validation group has some kind of hang... happens when they use a certain
number
of clients each running a stress test to the SUT, its like this, no real
handle on what's
wrong, if I knew what was wrong it would be half way or more to fixing it :)

The evidence shows you have hit the max clusters at one point, but have
freed most
of them back up again, there is no shortage right at this point. Your
previous data
showed a normal idle head/tail relationship....

Just as a data point, will you please disable msix, recompile and run in MSI
mode,
I just want to see if that makes a difference. Search in the driver for
em_enable_msix
and set it FALSE.

Jack

On Thu, Mar 31, 2011 at 4:06 PM, Arnaud Lacombe <lacombar at gmail.com> wrote:

> Hi,
>
> On Thu, Mar 31, 2011 at 6:28 PM, Jack Vogel <jfvogel at gmail.com> wrote:
> > OK, but those are not something present in this data, that was what I'm
> > asking.
> >
> > So, you have a hang for which we do not have a certain cause.  What does
> > netstat -m show?
> >
> # netstat -m
> 3073/74927/78000 mbufs in use (current/cache/total)
> 3070/29698/32768/32768 mbuf clusters in use (current/cache/total/max)
> 0/383 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/12800/12800/12800 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 6908K/129327K/136236K bytes allocated to network (current/cache/total)
> 0/1080/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/7/6656 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
>
> Note that the mbuf allocation denial did not appended at once. It has
> been progressively increasing by block of ~200 over the 5h of uptime
> of the machine, until the current condition occurred.
>
> I have previously been trying to simulate the depletion and the hang,
> but the driver recovered. I assume the condition is met in
> em_local_timer() to refresh the ring, I'd still need to check that.
>
>  - Arnaud
>
> > Jack
> >
> >
> > On Thu, Mar 31, 2011 at 3:15 PM, Arnaud Lacombe <lacombar at gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> On Thu, Mar 31, 2011 at 5:57 PM, Jack Vogel <jfvogel at gmail.com> wrote:
> >> > So, what is the evidence that the driver is stuck here?
> >> >
> >> About 800 pps (mostly SYN) present wire but never ever seen on em0,
> >> plus a couple of ARP reply, which still never hit em0, plus the
> >> `missed_packets' count increasing by the same 800 pps in the last
> >> hour. Is that enough ?
> >>
> >>  - Arnaud
> >>
> >> ps: I forgot to add that MAC address on the wire are fine.
> >>
> >> > I see that next_to_check != next_to_refresh, which is why the
> >> > local timer won't schedule anything. OH, and I also realized there
> >> > is a problem with local_timer anyway, it will run rxeof, but that
> won't
> >> > help
> >> > if you can't enter the loop, so I need to add some code at the top to
> >> > call em_refresh_mbufs() when in this state.
> >> >
> >> > On this interrupt cause that you are focused upon, although its there
> in
> >> > the
> >> > design, I had talked with some of our most seasoned developers on both
> >> > the Windows and Linux side of the house, and NO one has ever used this
> >> > 'feature', because (and I'm quoting here) "there's no good use case
> for
> >> > it".
> >> > Meaning, there's always some simpler way of handling the issue.
> >> >
> >> > When you use MSIX you can't read causes btw, if you configured it, it
> >> > would
> >> > mean you'd just get into the regular RX handler, same as always, so
> why
> >> > some special bother with this cause?
> >> >
> >> > On non-MSIX hardware there is just no particular reason to worry about
> >> > the
> >> > cause either, we can just handle the RX situation in the interrupt
> >> > handler.
> >> >
> >> > Jack
> >> >
> >> >
> >> > On Thu, Mar 31, 2011 at 2:09 PM, Arnaud Lacombe <lacombar at gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi Jack,
> >> >>
> >> >> On Thu, Mar 31, 2011 at 9:51 AM, Arnaud Lacombe <lacombar at gmail.com>
> >> >> wrote:
> >> >> > [...]
> >> >> > I'll remove part of the changes I made to keep only
> >> >> > `rx_forced_refill'
> >> >> > and the associated sysctl, re-run the tests and come back with
> >> >> > correct
> >> >> > value, hopefully in a few hours.
> >> >> >
> >> >> Here it is:
> >> >>
> >> >> # sysctl dev.em.0.%desc
> >> >> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.2
> >> >>
> >> >> # sysctl dev.em.0.mac_stats.missed_packets
> >> >> dev.em.0.mac_stats.missed_packets: 917428
> >> >>
> >> >> # sysctl dev.em.0.debug=1
> >> >> dev.em.0.debug: I-1nterface is RUNNING and INACTIVE
> >> >> em0: hw tdh = 975, hw tdt = 975
> >> >> em0: hw rdh = 884, hw rdt = 885
> >> >> em0: Tx Queue Status = 0
> >> >> em0: TX descriptors avail = 1024
> >> >> em0: Tx Descriptors avail failure = 0
> >> >> em0: RX discarded packets = 0
> >> >> em0: RX Next to Check = 884
> >> >> em0: RX Next to Refresh = 885
> >> >>  -> -1
> >> >>
> >> >> So the taskqueue cannot be scheduled to run and the driver is stuck.
> >> >>
> >> >> > On Wed, Mar 30, 2011 at 2:22 PM, Jack Vogel <jfvogel at gmail.com>
> >> >> > wrote:
> >> >> >> Read the code in HEAD, em_local_timer() has a test of ALL the rx
> >> >> >> queues
> >> >> >> and
> >> >> >> will schedule a task that refreshes mbufs if they are empty. This
> >> >> >> has
> >> >> >> exactly the
> >> >> >> same effect as checking for some interrupt cause, a cause that is
> >> >> >> not
> >> >> >> available
> >> >> >> when using MSIX on 82574, but this approach works for everything.
> >> >> >>
> >> >> Can you please point me to a reference datasheet (or errata),
> provided
> >> >> by Intel, about the RX Overrun interrupt not being available with
> >> >> MSI-X on the 82574 ?
> >> >>
> >> >> Currently, I only have access to [0], which precises the following:
> >> >>
> >> >> 7.4 Interrupts
> >> >> 7.4.2 MSI-X Mode
> >> >> [...]
> >> >> The following configuration and parameters are involved:
> >> >> • The IVAR.INT_Alloc[4:0] entries map two Tx queues, two Rx queues
> and
> >> >> other
> >> >> events to 5 interrupt vectors
> >> >> • The ICR[24:20] bits reflect specific interrupt causes
> >> >> • Five MSI-X interrupt vectors are provided (calculated based on four
> >> >> vectors for
> >> >> queues and one vector for other causes). The requested number of
> >> >> vectors
> >> >> is
> >> >> loaded from the MSI_X_N fields in the EEPROM into the PCIe MSI-X
> >> >> capability
> >> >> structure of the function.
> >> >>
> >> >> 10.2.4.1 Interrupt Cause Read Register - ICR (0x000C0; RC/WC)
> >> >> [...]
> >> >>
> >> >> about bit 24:
> >> >>
> >> >> Other Interrupt. Indicates one of the following interrupts was set:
> >> >> • Link Status Change.
> >> >> • Receiver Overrun.
> >> >> • MDIO Access Complete.
> >> >> • Small Receive Packet Detected.
> >> >> • Receive ACK Frame Detected.
> >> >> • Manageability Event Detected.
> >> >>
> >> >> Thanks in advance,
> >> >>  - Arnaud
> >> >>
> >> >> [0]: ftp://download.intel.com/design/network/datashts/82574.pdf
> >> >
> >> >
> >
> >
>