FreeBSD 8.0 retires into itself

Derek Ragona derek at computinginnovations.com
Thu Dec 3 13:30:10 UTC 2009


At 06:48 AM 12/3/2009, Igor V. Ruzanov wrote:
>On Thu, 3 Dec 2009, Derek Ragona wrote:
>
>|At 04:28 AM 12/3/2009, Igor V. Ruzanov wrote:
>|> Hello!
>|>
>|> I have updated FreeBSD 8.0 sources via cvsup and compiled system. uname -a
>|> shows:
>|>
>|> FreeBSD localhost 8.0-RELEASE FreeBSD 8.0-RELEASE #2: Mon Nov 30 
>20:15:12 MSD
>|> 2009  root at localhost:/usr/src/sys/i386/compile/HOME-PAE  i386
>|>
>|> Machine has 3 physical interfaces:
>|> - em0 (PCI/Intel PWLA 8390 MT)
>|> - em1 (PCI/Intel PWLA 8390 MT)
>|> - fxp0 (PCI/Intel EtherExpress PRO/100)
>|>
>|> and 2 VLANs: vlan317 and vlan320.
>|>
>|> Also there is one interface built in motherboard:
>|> - ale0 (PCI-E/Atheros AR8121)
>|>
>|> One physical interface (em0) is in trunk mode (802.1Q) to configure 
>these two
>|> VLANs (vlan317 and vlan320) interfaces. Machine acts as BGP router. It 
>has 3
>|> uplinks:
>|> - vlan317
>|> - vlan320
>|> - fxp0
>|>
>|> and one backbone interface:
>|> - em1.
>|>
>|> Next, i recompiled all userland and made all necessary configurations after
>|> which the machine became as production BGP router installed in server room.
>|> So issue looks like the following:
>|>
>|> After 20-30 minutes of stable work, the system starts to "retire into
>|> itself": any user processes (bgpd, zebra, named) don't respond, For 
>example a
>|> can't telnet to bgpd control terminal, telnet just dies showing:
>|> Trying 127.0.0.1...
>|> Connected to localhost.
>|> Escape character is '^]'
>|>
>|> I even tried to login into system from local console. But when i pressed
>|> Enter after username was typed, the console just hang. Power button also
>|> doesn't respond (in usual case pressing on Power button gives the 
>machine is
>|> going to power off). One interesting thing: after system was booted, top
>|> command shows:
>|>
>|> system eats about 28-30% of CPU time
>|> interrupts eat about only 6-7% of CPU time
>|> all user processes eat less than 0-1% of CPU time
>|>
>|> On another working machine (same BGP router, but system is FreeBSD 
>7.0-STABLE
>|> p4) the picture seems to be different:
>|>
>|> system etas 9-10% of CPU time
>|> interrupts eat 15-16% of CPU time
>|>
>|> So my question is the REASONS that cause such system behavior. I read
>|> UPDATING, so kernel in FreeBSD 8.0 RELEASE was largely reworked, in
>|> particular - SMPng in order to remove all non-MPSAFE driver's locks 
>(netperf
>|> project). Are there new specific kernel config options to get better
>|> perfomance of network subsystem? Or should i set some sysctl variables?
>|>
>|> My hardware:
>|> - Motherboard: ASUS P5P43TD (with built in Gigabit LAN Atheros AR8121)
>|> - Core 2 Quad CPU
>|> - 4G RAM (2x2048)
>|>
>|> kernel compiled with PAE support, ULE-scheduler, with PREEMPTION option.
>|> If you need whole kernel config, please let me know, i will post it ASAP.
>|>
>|>
>
>|You need to check your network setups:
>|ifconfig -a
>|
>|You can really only have one NIC on a single network.  With multiple NICs if
>|they are on the same network, you will have arp issues causing routing 
>issues.
>|You can easily check the arp table before and after you see this behavior
>|doing:
>|arp -a
>|after a reboot, then after the system becomes unresponsive after 30-40 
>minutes.
>|
>|Multiple NICs are necessary if you are using this system as a firewall or
>|packet filter.
>|
>|To narrow down your problem you may want to disable any NICs that are not
>|necessary and see if the problem persists.
>|
>
>Thank you for reply, Derek!
>
>I have different non-overlapped subnets on used network interfaces.
>Actually, my machine acts as a border rather than just a router. And it
>needs several network interface cards (NICs) - one of them looks in my
>network (my Autonomous System with my internal routing), and another ones
>look to different ISPs with their own ASs. It gives possibility to make a
>choice of more cheap route to any Internet resource.
>
>By the way, when i tested just installed system under traffic load
>generated with iperf tool, the system worked fine during several days.
>Configuration was the same except only one NIC was under traffic load. And
>similar tests with each NIC installed in my machine yielded the same good
>results.

Since it seems tied to load, which NIC is causing the trouble?  I'd suspect 
the motherboard NIC.  I have used many Intel NICs without problems.  In 
multi-NIC servers I setup, I usually add a quad-port Intel card and don't 
use the motherboard NICs.

You may want to try using a different NIC in place of the onboard and see 
if the problem persists.

         -Derek

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the freebsd-questions mailing list