Re: Kernel panic due to netback.c

From: Janis Abens <abj.dev_at_gmx.com>
Date: Mon, 20 Mar 2023 20:25:17 UTC
I'm sorry, it's unreadable. I sen't it from the new webmail, that has a default setting to HTML. Fixing my error and resending previous message as text.
 
Hello,

From time to time a kernel panic occurs. Xen-kernel-4.15, dom0, FreeBSD 13.0-RELEASE.

"Fatal trap 12: page fault while in kernel mode"

I can not repeat it reliably, but eventually it happens. I have captured a stack trace (always the same on crash), relevant part is:
..
#9  xnb_txpkt2gnttab (pkt=<optimized out>, pkt@entry=0xfffffe00c49fdac8, mbufc=<optimized out>, mbufc@entry=0xfffff8002f958500, gnttab=gnttab@entry=0xfffffe019ae94a70,
    txb=txb@entry=0xfffffe019ae95480, otherend_id=6) at /usr/src/sys/dev/xen/netback/netback.c:1715
#10 0xffffffff80a8d72a in xnb_recv (txb=0xfffffe019ae95480, otherend=6, mbufc=<optimized out>, ifnet=0xfffff80170f81000, gnttab=0xfffffe019ae94a70)
    at /usr/src/sys/dev/xen/netback/netback.c:1851
#11 xnb_intr (arg=0xfffffe019ae94000) at /usr/src/sys/dev/xen/netback/netback.c:1446
..

It seems netback.c has not changed in ages, same lines are valid in 13.2 RC3 as well.

relevant code around /usr/src/sys/dev/xen/netback/netback.c:1715
..
xnb_txpkt2gnttab(const struct xnb_pkt *pkt, struct mbuf *mbufc,
..
  while (size_remaining > 0) {
    const netif_tx_request_t *txq = RING_GET_REQUEST(txb, r_idx);
    const size_t mbuf_space = M_TRAILINGSPACE(mbuf) - m_ofs; /* PANIC happens here! */
    
..

By analyzing the trace i've come to conclusion that mbuf is NULL, thus macro:
#define M_TRAILINGSPACE(m) ((m)->m_maxlen - (m)->m_len)
introduces panic.

The only way mbuf can become NULL is within this same loop at line:1751 mbuf = mbuf->m_next;
It can not be NULL at the function call, because xnb_recv ensures that it is not NULL, before call.

The problem definiteley is because while condition is on size_remaining, but contents are accessed based on mbuf->m_next;

So my questions are:
1) would it be possible to add some function before the PANIC line (or mbuf->m_next) that dumps offending packet in error logs or something similar? The goal for this would be to find a way to reliably repeat this case and understand what is the cause? If there is no such a function, which variables would be relevant and hellpful in this case?
2) How could this code be modified so that it does not panic in this case, but just drops offending packet instead?

A code snippet in xnb_recv has caught my eye:
  if (*mbufc == NULL) {
    /*
     * Couldn't allocate mbufs.  Respond and drop the packet.  Do
     * not consume the requests
     */
    xnb_txpkt2rsp(&pkt, txb, 1);
    DPRINTF("xnb_intr: Couldn't allocate mbufs, num_consumed=%d\n",
        num_consumed);
    if_inc_counter(ifnet, IFCOUNTER_IQDROPS, 1);
    return ENOMEM;
  }

Could it be used in function xnb_txpkt2gnttab to avoid panic in this particular case as well?


Thank you!
Janis Abens