9.2 ixgbe tx queue hang

Sat Mar 22 02:56:07 UTC 2014

No errors for 1h 46m - That's a record. This is using the 9.2-STABLE ixgbe
in a 10.0-RELEASE system, with Rick's suggested code below.

I decided this must be it, so I aborted, and modified the ixgbe driver from
10.0-STABLE with Rick's suggestion. Installed and rebooted. Here's the
extra values I print out:

    if ((adapter->num_segs * MCLBYTES - ETHER_HDR_LEN) < IP_MAXPACKET) {
        printf("CF - Ricks Test! ifp->if_hw_tsomax = %d\n",
ifp->if_hw_tsomax);
        ifp->if_hw_tsomax = adapter->num_segs * MCLBYTES - ETHER_HDR_LEN;
        printf("CF - After Init, ifp->if_hw_tsomax = %d\n",
ifp->if_hw_tsomax);
        printf("CF - adapter->num_segs=%d, ETHER_HDR_LEN=%d,
IP_MAXPACKET=%d\n", adapter->num_segs, ETHER_HDR_LEN, IP_MAXPACKET);
    }

Which shows me:

ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version -
stable-2.5.15> port 0xfcc0-0xfcdf me
m 0xd9000000-0xd93fffff,0xd9bf8000-0xd9bfbfff irq 45 at device 0.0 on pci5
Mar 21 23:00:08 SAN0 kernel: ix0: Using MSIX interrupts with 9 vectors
Mar 21 23:00:08 SAN0 kernel: CF - Ricks Test! ifp->if_hw_tsomax = 0
Mar 21 23:00:08 SAN0 kernel: CF - After Init, ifp->if_hw_tsomax = 65522
Mar 21 23:00:08 SAN0 kernel: CF - adapter->num_segs=32, ETHER_HDR_LEN=14,
IP_MAXPACKET=65535
ix0: Ethernet address: 00:1b:21:d6:4c:4c

I don't see where the TSO max is being set in any other place. I see
IXGBE_TSO_SIZE = 262140 in ixgbe.h, and I suppose something similar is
happening in ixgbe_tso_setup, setting it to that 262149 default

However: This 10.0-STABLE ixgbe has the error. I'm getting it at 25 min of
runtime. I don't have the full printf's in this one yet, so I can't tell
you more about it.

I'm going back to the 9.2-STABLE ixgbe with the above tso modification for
a bit longer to confirm that I can run overnight without the error.

On Fri, Mar 21, 2014 at 10:25 PM, Christopher Forgeron <csforgeron at gmail.com
> wrote:

> It may be a little early, but I think that's it!
>
> It's been running without error for nearly an hour - It's very rare it
> would go this long under this much load.
>
> I'm going to let it run longer, then abort and install the kernel with the
> extra printfs so I can see what value ifp->if_hw_tsomax is before you set
> it.
>
> It still had netstat -m denied entries on boot, but they are not climbing
> like they did before:
>
>
> $ uptime
>  9:32PM  up 25 mins, 4 users, load averages: 2.43, 6.15, 4.65
> $ netstat -m
> 21556/7034/28590 mbufs in use (current/cache/total)
> 4080/3076/7156/6127254 mbuf clusters in use (current/cache/total/max)
> 4080/2281 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/53/53/3063627 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 16444/118/16562/907741 9k jumbo clusters in use (current/cache/total/max)
>
> 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max)
> 161545K/9184K/170729K bytes allocated to network (current/cache/total)
> 17972/2230/4111 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>
> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> 35/8909/0 requests for jumbo clusters denied (4k/9k/16k)
>
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
>
> - Started off bad with the 9k denials, but it's not going up!
>
> uptime
> 10:20PM  up  1:13, 6 users, load averages: 2.10, 3.15, 3.67
> root at SAN0:/usr/home/aatech # netstat -m
> 21569/7141/28710 mbufs in use (current/cache/total)
> 4080/3308/7388/6127254 mbuf clusters in use (current/cache/total/max)
> 4080/2281 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/53/53/3063627 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 16447/121/16568/907741 9k jumbo clusters in use (current/cache/total/max)
>
> 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max)
> 161575K/9702K/171277K bytes allocated to network (current/cache/total)
> 17972/2261/4111 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>
> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> 35/8913/0 requests for jumbo clusters denied (4k/9k/16k)
>
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
>
> This is the 9.2 ixgbe that I'm patching into 10.0, I'll move into the base
> 10.0 code tomorrow.
>
>
>  On Fri, Mar 21, 2014 at 8:44 PM, Rick Macklem <rmacklem at uoguelph.ca>wrote:
>
>>  Christopher Forgeron wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> > Hello all,
>> >
>> > I ran Jack's ixgbe MJUM9BYTES removal patch, and let iometer hammer
>> > away at the NFS store overnight - But the problem is still there.
>> >
>> >
>> > From what I read, I think the MJUM9BYTES removal is probably good
>> > cleanup (as long as it doesn't trade performance on a lightly memory
>> > loaded system for performance on a heavily memory loaded system). If
>> > I can stabilize my system, I may attempt those benchmarks.
>> >
>> >
>> > I think the fix will be obvious at boot for me - My 9.2 has a 'clean'
>> > netstat
>> > - Until I can boot and see a 'netstat -m' that looks similar to that,
>> > I'm going to have this problem.
>> >
>> >
>> > Markus: Do your systems show denied mbufs at boot like mine does?
>> >
>> >
>> > Turning off TSO works for me, but at a performance hit.
>> >
>> > I'll compile Rick's patch (and extra debugging) this morning and let
>> > you know soon.
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Mar 20, 2014 at 11:47 PM, Christopher Forgeron <
>> > csforgeron at gmail.com > wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > BTW - I think this will end up being a TSO issue, not the patch that
>> > Jack applied.
>> >
>> > When I boot Jack's patch (MJUM9BYTES removal) this is what netstat -m
>> > shows:
>> >
>> > 21489/2886/24375 mbufs in use (current/cache/total)
>> > 4080/626/4706/6127254 mbuf clusters in use (current/cache/total/max)
>> > 4080/587 mbuf+clusters out of packet secondary zone in use
>> > (current/cache)
>> > 16384/50/16434/3063627 4k (page size) jumbo clusters in use
>> > (current/cache/total/max)
>> > 0/0/0/907741 9k jumbo clusters in use (current/cache/total/max)
>> >
>> > 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max)
>> > 79068K/2173K/81241K bytes allocated to network (current/cache/total)
>> > 18831/545/4542 requests for mbufs denied
>> > (mbufs/clusters/mbuf+clusters)
>> >
>> > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>> > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>> > 15626/0/0 requests for jumbo clusters denied (4k/9k/16k)
>> >
>> > 0 requests for sfbufs denied
>> > 0 requests for sfbufs delayed
>> > 0 requests for I/O initiated by sendfile
>> >
>> > Here is an un-patched boot:
>> >
>> > 21550/7400/28950 mbufs in use (current/cache/total)
>> > 4080/3760/7840/6127254 mbuf clusters in use (current/cache/total/max)
>> > 4080/2769 mbuf+clusters out of packet secondary zone in use
>> > (current/cache)
>> > 0/42/42/3063627 4k (page size) jumbo clusters in use
>> > (current/cache/total/max)
>> > 16439/129/16568/907741 9k jumbo clusters in use
>> > (current/cache/total/max)
>> >
>> > 0/0/0/510604 16k jumbo clusters in use (current/cache/total/max)
>> > 161498K/10699K/172197K bytes allocated to network
>> > (current/cache/total)
>> > 18345/155/4099 requests for mbufs denied
>> > (mbufs/clusters/mbuf+clusters)
>> >
>> > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>> > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>> > 3/3723/0 requests for jumbo clusters denied (4k/9k/16k)
>> >
>> > 0 requests for sfbufs denied
>> > 0 requests for sfbufs delayed
>> > 0 requests for I/O initiated by sendfile
>> >
>> >
>> >
>> > See how removing the MJUM9BYTES is just pushing the problem from the
>> > 9k jumbo cluster into the 4k jumbo cluster?
>> >
>> > Compare this to my FreeBSD 9.2 STABLE machine from ~ Dec 2013 : Exact
>> > same hardware, revisions, zpool size, etc. Just it's running an
>> > older FreeBSD.
>> >
>> > # uname -a
>> > FreeBSD SAN1.XXXXX 9.2-STABLE FreeBSD 9.2-STABLE #0: Wed Dec 25
>> > 15:12:14 AST 2013 aatech at FreeBSD-Update
>> > Server:/usr/obj/usr/src/sys/GENERIC amd64
>> >
>> > root at SAN1:/san1 # uptime
>> > 7:44AM up 58 days, 38 mins, 4 users, load averages: 0.42, 0.80, 0.91
>> >
>> > root at SAN1:/san1 # netstat -m
>> > 37930/15755/53685 mbufs in use (current/cache/total)
>> > 4080/10996/15076/524288 mbuf clusters in use
>> > (current/cache/total/max)
>> > 4080/5775 mbuf+clusters out of packet secondary zone in use
>> > (current/cache)
>> > 0/692/692/262144 4k (page size) jumbo clusters in use
>> > (current/cache/total/max)
>> > 32773/4257/37030/96000 9k jumbo clusters in use
>> > (current/cache/total/max)
>> >
>> > 0/0/0/508538 16k jumbo clusters in use (current/cache/total/max)
>> > 312599K/67011K/379611K bytes allocated to network
>> > (current/cache/total)
>> >
>> > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>> > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>> > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>> > 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>> > 0/0/0 sfbufs in use (current/peak/max)
>> > 0 requests for sfbufs denied
>> > 0 requests for sfbufs delayed
>> > 0 requests for I/O initiated by sendfile
>> > 0 calls to protocol drain routines
>> >
>> > Lastly, please note this link:
>> >
>> > http://lists.freebsd.org/pipermail/freebsd-net/2012-October/033660.html
>> >
>> Hmm, this mentioned the ethernet header being in the TSO segment. I think
>> I already mentioned my TCP/IP is rusty and I know diddly about TSO.
>> However, at a glance it does appear the driver uses ether_output() for
>> TSO segments and, as such, I think an ethernet header is prepended to the
>> TSO segment. (This makes sense, since how else would the hardware know
>> what ethernet header to use for the TCP segments generated.)
>>
>> I think prepending the ethernet header could push the total length
>> over 64K, given a default if_hw_tsomax == IP_MAXPACKET. And over 64K
>> isn't going to fit in 32 * 2K (mclbytes) clusters, etc and so forth.
>>
>> Anyhow, I think the attached patch will reduce if_hw_tsomax, so that
>> the result should fit in 32 clusters and avoid EFBIG for this case,
>> so it might be worth a try?
>> (I still can't think of why the CSUM_TSO bit isn't set for the printf()
>>  case, but it seems TSO segments could generate EFBIG errors.)
>>
>> Maybe worth a try, rick
>>
>> > It's so old that I assume the TSO leak that he speaks of has been
>> > patched, but perhaps not. More things to look into tomorrow.
>> >
>> >
>> >
>> >
>> >
>>
>> _______________________________________________
>> freebsd-net at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>>
>
>