RELENG_7: something is very wrong with UDP?
Robert Watson
rwatson at FreeBSD.org
Fri Sep 19 11:48:49 UTC 2008
On Fri, 19 Sep 2008, Oleg V. Nauman wrote:
>> (1) Start by deleting all but one nameserver entry in /etc/resolv.conf.
>> Confirm that you can still reproduce the problem.
>
> Due to various reasons my laptop running local caching DNS server ( named )
> without any forwarders assigned. My /etc/resolv.conf contains nameserver
> 127.0.0.1
This is simplifying in some senses, but complicating in others. In
particular, the question it raises is whether the problem is in the DNS
resolver or the nameserver. Seeing a tcpdump of lo0 for DNS traffic would be
quite interesting, since we could look at timestamps and try to place the
blame a bit more precisely.
>> Could you
>> also use procstat -k on the dig process to generate a kernel stack trace
>> for it?
Let's add to this list: when the problem happens, could you also procstat -k
the name server process(es)?
> And procstat -kk output for logger process waiting:
>
> PID TID COMM TDNAME KSTACK
> 1421 100095 logger - mi_switch+0x2c8
> sleepq_switch+0xd9 sleepq_catch_signals+0x239 sleepq_wait_sig+0x14
> _sleep+0x35f pipe_read+0x389 dofileread+0x96 kern_readv+0x58 read+0x4f
> syscall+0x2b3 Xint0x80_syscall+0x20
Interesting -- logger is blocked on reading from a pipe, likely standard
input. So it sounds like something else is failing to complete in a timely
manner -- perhaps due to DNS.
>> This is approximately the date of my last UDP MFC. Could you try backing
>> out just src/sys/netinet6/udp6_usrreq.c revision 1.81.2.7 and see if that
>> helps? (specifically, restore the use of sosend_generic instead of
>> sosend_dgram)
If you can show that it's definitely a problem with the change to sosend_dgram
for UDPv6 socket send, then it might suggest it's the same problem that it is
related to the UDPv46 code there. In which case I will propose we back out
that portion of the change in the 7-stable branch until it's known to be
resolved -- I don't want other people tripping over this.
>> Could you try compiling your kernel with WITNESS to see if we get any
>> extended debugging information?
>
> Have added WITNESS ( and STACK required by procstat ) options but it is not
> producing any output ( so no LORs or something like this )
OK. Could you try adding INVARIANT_SUPPORT and INVARIANTS if they aren't
there? Be aware: this may convert the wedging you are experiencing into a
kernel panic.
>>> Is anybody experiencing the same issues with fresh RELENG_7? Unsure it is
>>> my local issues though
>>
>> I'm not experiencing them, but these sorts of things can be quite subtle
>> and workload-dependent.
>
> Well experiencing this issue during the system boot even..
OK. So there must be something a bit different about your setup -- perhaps
there's something specific about the way things are interacting over the
loopback address for the name server. Is this the stock system BIND9 or
something else? Are you able to temporarily switch to an external name server
and see if that changes things?
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the freebsd-stable
mailing list