unexplained latency, interrupt spikes and loss of throughput on FreeBSD router/firewall system

Thu Jan 16 01:43:03 UTC 2020

On Wed, Jan 15, 2020 at 5:24 PM Navdeep Parhar <nparhar at gmail.com> wrote:

> On 1/15/20 6:55 AM, John Jasen wrote:
> > Executive summary:
> >
> > Periodically, load will spike on network interrupts on one of our
> > firewalls. Latency will quickly climb to the point that things are
> > unresponsive, sessions will timeout, and bandwidth will plummet.
>
> Is this with 9000 MTU?  Can you please post "netstat -m" from this
> system?

25683/15822/41505 mbufs in use (current/cache/total)
8190/8340/16530/2038296 mbuf clusters in use (current/cache/total/max)
8190/8255 mbuf+clusters out of packet secondary zone in use (current/cache)
2576/293/2869/1019147 4k (page size) jumbo clusters in use
(current/cache/total/max)
540546/1917/542463/10000000 9k jumbo clusters in use
(current/cache/total/max)
0/0/0/169857 16k jumbo clusters in use (current/cache/total/max)
4898018K/39060K/4937079K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/53561/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed

> Assuming this is 9000 MTU, try setting this in
> /boot/loader.conf and reboot:
>
> hw.cxgbe.largest_rx_cluster=4096
>

We're already there.

>
> > We do not see increases in ethernet pause frames, drops, errors, or
> > anything else like that from the system.
>
> This part is strange.  The incoming frames are either being dropped
> (errors or overflows) or getting throttled via pause frames.  I'd have
> expected "netstat -dI <ifnet>" to show errors or drops or "sysctl dev.cc
> dev.cxl | grep pause" to show some activity.  Can you please double check?
>

After a prior event on a firewall cluster, I started pushing pause frames
and netstat drops/errors to elasticsearch. They remained constant and
unchanging during this incident. I checked again, and they're still a flat
line.

>
> Regards,
> Navdeep
>

Thanks!

>