Kernel panic due to netback.c

From: Janis Abens <abj.dev_at_gmx.com>
Date: Mon, 20 Mar 2023 20:10:38 UTC
<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>Hello,</div>

<div>&nbsp;</div>

<div>From time to time a kernel panic occurs. Xen-kernel-4.15, dom0, FreeBSD 13.0-RELEASE.</div>

<div>&nbsp;</div>

<div>&quot;Fatal trap 12: page fault while in kernel mode&quot;</div>

<div>&nbsp;</div>

<div>I can not repeat it reliably, but eventually it happens. I have captured a stack trace (always the same on crash), relevant part is:<br/>
..<br/>
#9&nbsp; xnb_txpkt2gnttab (pkt=&lt;optimized out&gt;, pkt@entry=0xfffffe00c49fdac8, mbufc=&lt;optimized out&gt;, mbufc@entry=0xfffff8002f958500, gnttab=gnttab@entry=0xfffffe019ae94a70,<br/>
&nbsp;&nbsp;&nbsp; txb=txb@entry=0xfffffe019ae95480, otherend_id=6) at /usr/src/sys/dev/xen/netback/netback.c:1715<br/>
#10 0xffffffff80a8d72a in xnb_recv (txb=0xfffffe019ae95480, otherend=6, mbufc=&lt;optimized out&gt;, ifnet=0xfffff80170f81000, gnttab=0xfffffe019ae94a70)<br/>
&nbsp;&nbsp;&nbsp; at /usr/src/sys/dev/xen/netback/netback.c:1851<br/>
#11 xnb_intr (arg=0xfffffe019ae94000) at /usr/src/sys/dev/xen/netback/netback.c:1446<br/>
..</div>

<div>&nbsp;</div>

<div>It seems netback.c has not changed in ages, same lines are valid in 13.2 RC3 as well.</div>

<div>&nbsp;</div>

<div>relevant code around /usr/src/sys/dev/xen/netback/netback.c:1715<br/>
..<br/>
xnb_txpkt2gnttab(const struct xnb_pkt *pkt, struct mbuf *mbufc,<br/>
..<br/>
&nbsp; while (size_remaining &gt; 0) {<br/>
&nbsp;&nbsp;&nbsp; const netif_tx_request_t *txq = RING_GET_REQUEST(txb, r_idx);<br/>
&nbsp;&nbsp;&nbsp; const size_t mbuf_space = M_TRAILINGSPACE(mbuf) - m_ofs; /* PANIC happens here! */<br/>
..</div>

<div>&nbsp;</div>

<div>By analyzing the trace i&#39;ve come to conclusion that mbuf is NULL, thus macro:<br/>
#define M_TRAILINGSPACE(m) ((m)-&gt;m_maxlen - (m)-&gt;m_len)<br/>
introduces panic.</div>

<div>&nbsp;</div>

<div>The only way mbuf can become NULL is within this same loop at line:1751 mbuf = mbuf-&gt;m_next;<br/>
it can not be NULL at the function call, because xnb_recv ensures that it is not NULL, before call.</div>

<div>&nbsp;</div>

<div>The problem definiteley is because &quot;while condition&quot; is on size_remaining, but contents are accessed based on mbuf-&gt;m_next;</div>

<div>&nbsp;</div>

<div>So my questions are:<br/>
1) would it be possible to add some function before the PANIC line (or mbuf-&gt;m_next) that dumps offending packet in error logs or something similar? The goal for this would be to find a way to reliably repeat this case and understand what is the cause? If there is no such a function, which variables would be relevant and hellpful in this case?<br/>
2) How could this code be modified so that it does not panic in this case, but just drops offending packet instead?</div>

<div>A code snippet in xnb_recv has caught my eye:<br/>
&nbsp; if (*mbufc == NULL) {<br/>
&nbsp;&nbsp;&nbsp; /*<br/>
&nbsp;&nbsp;&nbsp;&nbsp; * Couldn&#39;t allocate mbufs.&nbsp; Respond and drop the packet.&nbsp; Do<br/>
&nbsp;&nbsp;&nbsp;&nbsp; * not consume the requests<br/>
&nbsp;&nbsp;&nbsp;&nbsp; */<br/>
&nbsp;&nbsp;&nbsp; xnb_txpkt2rsp(&amp;pkt, txb, 1);<br/>
&nbsp;&nbsp;&nbsp; DPRINTF(&quot;xnb_intr: Couldn&#39;t allocate mbufs, num_consumed=%d&#92;n&quot;,<br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; num_consumed);<br/>
&nbsp;&nbsp;&nbsp; if_inc_counter(ifnet, IFCOUNTER_IQDROPS, 1);<br/>
&nbsp;&nbsp;&nbsp; return ENOMEM;<br/>
&nbsp; }</div>

<div>Could it be used in function xnb_txpkt2gnttab to avoid panic in this particular case as well?</div>

<div><br/>
Thank you!<br/>
Janis Abens</div>

<div>&nbsp;</div></div></body></html>