Kernel panic due to netback.c
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 20 Mar 2023 20:10:38 UTC
<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>Hello,</div>
<div> </div>
<div>From time to time a kernel panic occurs. Xen-kernel-4.15, dom0, FreeBSD 13.0-RELEASE.</div>
<div> </div>
<div>"Fatal trap 12: page fault while in kernel mode"</div>
<div> </div>
<div>I can not repeat it reliably, but eventually it happens. I have captured a stack trace (always the same on crash), relevant part is:<br/>
..<br/>
#9 xnb_txpkt2gnttab (pkt=<optimized out>, pkt@entry=0xfffffe00c49fdac8, mbufc=<optimized out>, mbufc@entry=0xfffff8002f958500, gnttab=gnttab@entry=0xfffffe019ae94a70,<br/>
txb=txb@entry=0xfffffe019ae95480, otherend_id=6) at /usr/src/sys/dev/xen/netback/netback.c:1715<br/>
#10 0xffffffff80a8d72a in xnb_recv (txb=0xfffffe019ae95480, otherend=6, mbufc=<optimized out>, ifnet=0xfffff80170f81000, gnttab=0xfffffe019ae94a70)<br/>
at /usr/src/sys/dev/xen/netback/netback.c:1851<br/>
#11 xnb_intr (arg=0xfffffe019ae94000) at /usr/src/sys/dev/xen/netback/netback.c:1446<br/>
..</div>
<div> </div>
<div>It seems netback.c has not changed in ages, same lines are valid in 13.2 RC3 as well.</div>
<div> </div>
<div>relevant code around /usr/src/sys/dev/xen/netback/netback.c:1715<br/>
..<br/>
xnb_txpkt2gnttab(const struct xnb_pkt *pkt, struct mbuf *mbufc,<br/>
..<br/>
while (size_remaining > 0) {<br/>
const netif_tx_request_t *txq = RING_GET_REQUEST(txb, r_idx);<br/>
const size_t mbuf_space = M_TRAILINGSPACE(mbuf) - m_ofs; /* PANIC happens here! */<br/>
..</div>
<div> </div>
<div>By analyzing the trace i've come to conclusion that mbuf is NULL, thus macro:<br/>
#define M_TRAILINGSPACE(m) ((m)->m_maxlen - (m)->m_len)<br/>
introduces panic.</div>
<div> </div>
<div>The only way mbuf can become NULL is within this same loop at line:1751 mbuf = mbuf->m_next;<br/>
it can not be NULL at the function call, because xnb_recv ensures that it is not NULL, before call.</div>
<div> </div>
<div>The problem definiteley is because "while condition" is on size_remaining, but contents are accessed based on mbuf->m_next;</div>
<div> </div>
<div>So my questions are:<br/>
1) would it be possible to add some function before the PANIC line (or mbuf->m_next) that dumps offending packet in error logs or something similar? The goal for this would be to find a way to reliably repeat this case and understand what is the cause? If there is no such a function, which variables would be relevant and hellpful in this case?<br/>
2) How could this code be modified so that it does not panic in this case, but just drops offending packet instead?</div>
<div>A code snippet in xnb_recv has caught my eye:<br/>
if (*mbufc == NULL) {<br/>
/*<br/>
* Couldn't allocate mbufs. Respond and drop the packet. Do<br/>
* not consume the requests<br/>
*/<br/>
xnb_txpkt2rsp(&pkt, txb, 1);<br/>
DPRINTF("xnb_intr: Couldn't allocate mbufs, num_consumed=%d\n",<br/>
num_consumed);<br/>
if_inc_counter(ifnet, IFCOUNTER_IQDROPS, 1);<br/>
return ENOMEM;<br/>
}</div>
<div>Could it be used in function xnb_txpkt2gnttab to avoid panic in this particular case as well?</div>
<div><br/>
Thank you!<br/>
Janis Abens</div>
<div> </div></div></body></html>