RELENG_7: something is very wrong with UDP?
Oleg V. Nauman
oleg at opentransfer.com
Fri Sep 19 11:36:38 UTC 2008
Quoting Robert Watson <rwatson at FreeBSD.org>:
> On Thu, 18 Sep 2008, Oleg V. Nauman wrote:
>
>> It seems to be something is very wrong with UDP on latest RELENG_7
>>
>> Well some symptoms I have seen today when I was trying to boot
>> newly compiled RELENG_7 on my laptop:
>>
>> a) rc scripts indefinitely waiting on logger to be completed during
>> the boot ( devd and ifconfig are good examples)
>
> If you hit "ctrl-t" while these are waiting, what is the output?
load: 0.00 cmd: logger [nanslp] 0.00u 0.07s 0% 832k
>
>> b) Sporadic DNS request failures
>
> I don't know what your comfortable level with debugging tools is, but
> if you're happy using tcpdump, etc, I think I'd recommend diagnosing
> this directly that way. I'd probably do something like this:
>
> (1) Start by deleting all but one nameserver entry in /etc/resolv.conf.
> Confirm that you can still reproduce the problem.
Due to various reasons my laptop running local caching DNS server (
named ) without any forwarders assigned. My /etc/resolv.conf contains
nameserver 127.0.0.1
>
> (2) Use dig(1) and tcpdump(1) to watch wire-level DNS behavior -- do you see
> queries go out? Do you see replies come back? Is dig "waking up" and
> seeing the replies when they arrive, or is there a delay or hang in dig?
> If dig hangs, what does ctrl-t show the sleep state (wmesg) is?
Will try do dig into when it occurs again
> Could you
> also use procstat -k on the dig process to generate a kernel stack trace
> for it?
>
>> c) traceroute prints 0.00 like response time for every host
>>
>> d) was unable to reboot my laptop performing shutdown -r ( due to
>> logger/syslog related issues I think)
>
> Could you try killing syslogd by hand and see if it dies? If not, can
> you use procstat -kk to generate a stack trace for it?
syslogd killing not helps..
Here is procstat -kk output for "shutdown -r now" process waiting on
something:
PID TID COMM TDNAME KSTACK
1447 100098 shutdown - mi_switch+0x2c8
sleepq_switch+0xd9 sleepq_catch_signals+0x239
sleepq_timedwait_sig+0x17 _sleep+0x339 kern_nanosleep+0xc1
nanosleep+0x6f syscall+0x2b3 Xint0x80_syscall+0x20
And procstat -kk output for logger process waiting:
PID TID COMM TDNAME KSTACK
1421 100095 logger - mi_switch+0x2c8
sleepq_switch+0xd9 sleepq_catch_signals+0x239 sleepq_wait_sig+0x14
_sleep+0x35f pipe_read+0x389 dofileread+0x96 kern_readv+0x58 read+0x4f
syscall+0x2b3 Xint0x80_syscall+0x20
>
>> e ) I was unable to start X session ( it seems to be freezes laptop
>> because I was unable to switch to another virtual console even)
>>
>> csup "backout" to date=2008.09.15.12.00.00 and recompiling the
>> kernel fixes this issue for me.
>
> This is approximately the date of my last UDP MFC. Could you try
> backing out just src/sys/netinet6/udp6_usrreq.c revision 1.81.2.7 and
> see if that helps? (specifically, restore the use of sosend_generic
> instead of sosend_dgram)
>
> Could you confirm that either you're not using any kernel modules from
> ports, or that if you are, you have recompiled them with your most
> recent update?
I'm not using any third party kernel modules at this moment.
>
> Could you try compiling your kernel with WITNESS to see if we get any
> extended debugging information?
Have added WITNESS ( and STACK required by procstat ) options but it
is not producing any output ( so no LORs or something like this )
>
>> Is anybody experiencing the same issues with fresh RELENG_7? Unsure
>> it is my local issues though
>
> I'm not experiencing them, but these sorts of things can be quite
> subtle and workload-dependent.
Well experiencing this issue during the system boot even..
>
>
> Robert N M Watson
> Computer Laboratory
> University of Cambridge
More information about the freebsd-stable
mailing list