Possible UDP related deadlock in 7.1-PRERELEASE

Wed Sep 17 15:12:07 UTC 2008

On September 17, 2008, Robert Watson wrote:
> On Mon, 15 Sep 2008, Norbert Papke wrote:
> > With WITNESS enabled, I now experience panics and could not follow your
> > instructions.  There is no core dump.  The following gets logged to
> > /var/log/messages:
> >
> > shared lock of (rw) udpinp @ /usr/src/sys/netinet/udp_usrreq.c:864
> > while exclusively locked from /usr/src/sys/netinet6/udp6_usrreq.c:940
> > panic: share->excl
> > KDB: stack backtrace:
> > db_trace_self_wrapper(c06fda7c,f6b96978,c052046a,c06fbb5d,c07695c0,...)
> > at db_trace_self_wrapper+0x26
> > kdb_backtrace(c06fbb5d,c07695c0,c06febd1,f6b96984,f6b96984,...) at
> > kdb_backtrace+0x29
> > panic(c06febd1,c070c409,3ac,c0709eee,360,...) at panic+0xaa
> > witness_checkorder(ccd5209c,1,c0709eee,360,8,...) at
> > witness_checkorder+0x17c
> > _rw_rlock(ccd5209c,c0709eee,360,c07780e0,cd4652c8,...) at _rw_rlock+0x2a
> > udp_send(d3942000,0,c580f400,c68faa00,0,...) at udp_send+0x197
> > udp6_send(d3942000,0,c580f400,c68faa00,0,...) at udp6_send+0x140
> > sosend_generic(d3942000,c68faa00,f6b96be8,0,0,...) at
> > sosend_generic+0x50d sosend(d3942000,c68faa00,f6b96be8,0,0,...) at
> > sosend+0x3f
> > kern_sendit(cd465230,f,f6b96c64,0,0,...) at kern_sendit+0x106
> > sendit(0,871b9fe,0,c68faa00,1c,...) at sendit+0x182
> > sendto(cd465230,f6b96cfc,18,cd465230,c072bab8,...) at sendto+0x4f
> > syscall(f6b96d38) at syscall+0x293
> >
> > Note that I do not use IPv6, none of my network interfaces is configured
> > for it.

> To clarify what you're seeing a bit: some applications that are adapted to
> use both IPv4 and IPv6 open combined v4/v6 sockets.  This is possible
> because there is a section of the IPv6 address space that "contains" the v4
> address space.  When an application sends to a v4 address using a v6 socket
> (wave hands here) the kernel actually calls the v4 UDP code from within the
> v6 socket code, and it turns out there's a locking bug in that path.  So
> likely some application you are running is using this compatibility mode,
> and hence triggering this bug.

Thank you for this explanation.  It helps my peace of mind to understand the 
context.

> I need to think for a bit about the best way to fix it (it's easy to hack
> around, but obviously "hacking around" is not the desired solution), and
> I'll get back to you later this week with a patch.

I am certainly happy to try a patch when it becomes available.

> For my reference, it would probably be helpful to know what the application
> is, since apparently this didn't arise in our testing.  You can type "show
> pcpu" at the DDB prompt after this panic to show what thread is currently
> running.

This may be difficult.  I was not entirely clear in my description of the 
panic.  I experience spontaneous reboots when the panic is occurs.  DDB is 
not invoked, nor is a core generated.  

My suspicion is that "ktorrent", the KDE3 torrent client, is triggering this 
condition.  When I broke into DDB with a non-WITNESS kernel, I observed that 
one of the "ktorrent" threads was locked on "*udpinp".  
Additionally, "hald", "ntpd" and the NIC interrupt thread had "*udp" locked.  
Not sure if this is information is helpful.

Cheers,

-- Norbert.