What's up with the IP stack?
Kevin Oberman
oberman at es.net
Mon Oct 13 09:28:42 PDT 2003
> From: Sam Leffler <sam at errno.com>
> Date: Sun, 12 Oct 2003 11:56:53 -0700
> Sender: owner-freebsd-current at freebsd.org
>
> On Sunday 12 October 2003 11:03 am, Andre Guibert de Bruet wrote:
> > On Sun, 12 Oct 2003, Josef Karthauser wrote:
> > > On Sun, Oct 12, 2003 at 02:48:01PM +0200, Soren Schmidt wrote:
> > > > It seems Josef Karthauser wrote:
> > > > > I've just built and installed a new kernel, the first since Aug 6th.
> > > > > There appears to be a problem with the IP stack. What happens is
> > > > > that everything is fine for a few hours, and then the IP stack stops
> > > > > working. I can no longer ping anything on the local network, my
> > > > > default route drops out (which is probably dhclient's doing).
> > > > > Perhaps it is ARP that is broken, it's hard to tell. All I know is
> > > > > that I need to reboot to make it work again.
> > > > >
> > > > > Is anyone else experiencing this kind of problem?
> > > >
> > > > Do you have dummynet included in the kernel ?
> > > > That has been broken for me since sam's latest commit as a backout
> > > > of ip_dummynet.c fixes the problem for me...
> > >
> > > No, I've not got dummynet in there. My current kernel config is:
> >
> > I experienced this a week ago. I found that ifconfig'ing the interface
> > down and back up again "fixed" the problem. I've since reverted to a
> > kernel compiled on September 25th.
>
> It would be good to know more details; I still don't have much to go on. Try
> to identify, for example, if the problem is specific to a particular
> device/interface or feature you're using (e.g dummynet). If you have ddb in
> your system, then when the system gets into a bad state break into the
> debugger and look for threads that are blocked on locks. If you have witness
> in your kernel then show locks would also be useful. If you don't have
> witness in your system then rebuild your kernel with it.
>
> The most recent round of changes were to lock the routing table. These went
> in 10/3 and were extensive. They could easily be the problem but w/o more
> info I can't really help.
Just a few more data points. I am seeing the problem on my ThinkPad
T30 only on the wireless interface. I have never seen it when
connected by 10/100 via fxp0.
When I see this I can reach some LAN hosts, but not others. I can
always seem to reach the access point. I can usually, but not always,
reach most other systems on the LAN, but not the gateway router, a
Sonic Wall firewall. I have logged onto another system and then
connected to the firewall, so it looks like the physical path is OK.
The problem is intermittent and I have only scattered data. I've been
seeing it sice about the beginning of October. I was blaming it on
hardware, but now that I see these reports, maybe it's not. (I just
replaced my Apple Airport AP with a D-Link, so there is something to
suspect.)
In may case things just start working again. The pause can vary from a
few seconds to about 10 minutes. netstat -rnf inet and arp -a output
both look to be fine.
--
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman at es.net Phone: +1 510 486-8634
More information about the freebsd-current
mailing list