RELENG_7: something is very wrong with UDP?

Oleg V. Nauman oleg at opentransfer.com
Fri Sep 19 11:36:38 UTC 2008


Quoting Robert Watson <rwatson at FreeBSD.org>:

> On Thu, 18 Sep 2008, Oleg V. Nauman wrote:
>
>> It seems to be something is very wrong with UDP on latest RELENG_7
>>
>> Well some symptoms I have seen today when I was trying to boot   
>> newly compiled RELENG_7 on my laptop:
>>
>> a) rc scripts indefinitely waiting on logger to be completed during  
>>  the boot ( devd and ifconfig are good examples)
>
> If you hit "ctrl-t" while these are waiting, what is the output?

load: 0.00 cmd: logger [nanslp] 0.00u 0.07s 0% 832k

>
>> b) Sporadic DNS request failures
>
> I don't know what your comfortable level with debugging tools is, but
> if you're happy using tcpdump, etc, I think I'd recommend diagnosing
> this directly that way.  I'd probably do something like this:
>
> (1) Start by deleting all but one nameserver entry in /etc/resolv.conf.
>     Confirm that you can still reproduce the problem.

  Due to various reasons my laptop running local caching DNS server (  
named ) without any forwarders assigned. My /etc/resolv.conf contains
nameserver 127.0.0.1


>
> (2) Use dig(1) and tcpdump(1) to watch wire-level DNS behavior -- do you see
>     queries go out?  Do you see replies come back?  Is dig "waking up" and
>     seeing the replies when they arrive, or is there a delay or hang in dig?
>     If dig hangs, what does ctrl-t show the sleep state (wmesg) is?

  Will try do dig into when it occurs again

> Could you
>     also use procstat -k on the dig process to generate a kernel stack trace
>     for it?
>
>> c) traceroute prints 0.00 like response time for every host
>>
>> d) was unable to reboot my laptop performing shutdown -r ( due to   
>> logger/syslog related issues I think)
>
> Could you try killing syslogd by hand and see if it dies?  If not, can
> you use procstat -kk to generate a stack trace for it?

  syslogd killing not helps..
Here is procstat -kk output for "shutdown -r now" process waiting on  
something:

   PID    TID COMM             TDNAME           KSTACK
  1447 100098 shutdown         -                mi_switch+0x2c8  
sleepq_switch+0xd9 sleepq_catch_signals+0x239  
sleepq_timedwait_sig+0x17 _sleep+0x339 kern_nanosleep+0xc1  
nanosleep+0x6f syscall+0x2b3 Xint0x80_syscall+0x20

And procstat -kk output for logger process waiting:

   PID    TID COMM             TDNAME           KSTACK
  1421 100095 logger           -                mi_switch+0x2c8  
sleepq_switch+0xd9 sleepq_catch_signals+0x239 sleepq_wait_sig+0x14  
_sleep+0x35f pipe_read+0x389 dofileread+0x96 kern_readv+0x58 read+0x4f  
syscall+0x2b3 Xint0x80_syscall+0x20

>
>> e ) I was unable to start X session ( it seems to be freezes laptop  
>>  because I was unable to switch to another virtual console even)
>>
>> csup "backout" to date=2008.09.15.12.00.00 and recompiling the   
>> kernel fixes this issue for me.
>
> This is approximately the date of my last UDP MFC.  Could you try
> backing out just src/sys/netinet6/udp6_usrreq.c revision 1.81.2.7 and
> see if that helps? (specifically, restore the use of sosend_generic
> instead of sosend_dgram)
>
> Could you confirm that either you're not using any kernel modules from
> ports, or that if you are, you have recompiled them with your most
> recent update?

  I'm not using any third party kernel modules at this moment.

>
> Could you try compiling your kernel with WITNESS to see if we get any
> extended debugging information?

  Have added WITNESS ( and STACK required by procstat ) options but it  
is not producing any output ( so no LORs or something like this )

>
>> Is anybody experiencing the same issues with fresh RELENG_7? Unsure  
>>  it is my local issues though
>
> I'm not experiencing them, but these sorts of things can be quite
> subtle and workload-dependent.

  Well experiencing this issue during the system boot even..

>
>
> Robert N M Watson
> Computer Laboratory
> University of Cambridge




More information about the freebsd-stable mailing list