lge fiber-optic loose connection for 1-6s
pyunyh at gmail.com
Wed Jun 6 06:13:55 UTC 2007
On Tue, Jun 05, 2007 at 06:03:20PM +0100, Paul Bielecki wrote:
> Hello All
> I have network connection problems with my small database/samba server.
> Machine is on small shuttle box with lge fiber-optic 1000baseSX on LAN
> and rl0 to VPN connection.
> Server been set up by somebody else, about 4 years ago and have not
> been update since.
> I have 6x FreeBSD +2x linux + 4x M$ servers, but it is only one server
> I have connection problems with.
> It is FreeBSD 4.8 stable, Mysql 4.0.12, Samba 2.2.8
> Network: 330 machines + network printers; 60 machines including this
> server on 10.0.0.0/24, printers are on 10.0.0.0/22 and the rest lan is
> 10.0.1.0/22, 10.0.2.0/22, 10.0.3.0/22.
> Default gateway is set to host in 10.0.0.0/24.
> rl link is connected to a second FreeBSD box which act only as a VPN,
> network 172.16.12.0/24.
> There is one main switch which connects servers and uplinks from all
> rooms and buildings.
> Almost all windows machines in network are up-to date and all have
> anti virus software installed.
> What happen is that occasionally, from 6 to 20 times a day, all
> machines seems to lose connection with this server for 1-6 seconds.
> If it happens
> -I can ping google.com or other host in the same network from server
> itself and I have reply (?)
> -I lose my ssh connection to this server
> -there is no errors or warnings in messages apart smbd errors
> -samba gives me lots of "smbd read_data: read failure for 4. Error =
> Operation time out" or smbd_oplock/oplock break.
> -tcpdump shows lots of ACK packtes from to server on 139
> I think that having 10.0.0.0/24 and 10.0.0.0/22 as a one big thing
> doesn't help, believe that it should be set up with VLANs but I can't
> change it just like that.
> The second thing is that M$ network is not configured properly, there
> should be one wins server or PDC, no bcasts.
> I use to just blindly watch tcpdump -v -s 255 -i lge0 port not 22 and
> port not 139 and not icmp
> but I dont know what should I look for.
> Let me know your thoughts and please give me some "tips" how can I
> diagnose what can cause my problems.
> some help with tcpdump would be much appreciated too,
> for instance:
> 17:05:49.644256 0.00:01:e6:9d:07:16.452 >
> 0.ff:ff:ff:ff:ff:ff.452:ipx-sap-resp 30c '0001E69D071680DDNPI9D0716'
> addr 0.00:01:e6:9d:07:16
> 17:33:04.521449 802.1d config 8000.00:05:5d:1f:00:80.8002 root
> 8000.00:05:5d:1f:00:80 pathcost 0 age 0 max 20 hello 2 fdelay 15
> # printers
> 17:33:07.370377 10.0.0.225.svrloc > HP-DEVICE-DISC.MCAST.NET.svrloc:
> [udp sum ok] udp 151 (ttl 4, id 51568, len 179)
> 17:05:18.409507 10.0.0.237.netbios-dgm > 255.255.255.255.netbios-dgm:
> [udp sum ok] NBT UDP PACKET(138) (ttl 60, id 14452, len 229)
> 17:05:18.757053 10.0.0.218.netbios-dgm > 255.255.255.255.netbios-dgm:
> [udp sum ok] NBT UDP PACKET(138) (ttl 60, id 20727, len 229)
> # another samba server to bcast
> 17:05:29.708120 10.0.0.127.33191 > 10.0.3.255.netbios-ns: [udp sum ok]
> NBT UDP PACKET(137): QUERY; REQUEST; BROADCAST (DF) (ttl 64, id 0, len
I'm unsure what caused this issue but it seems that lge(4) lacks some
protections from overly-fragmented packets.
Did you see "watchdog timeout" messages in console?
I don't have lge(4) hardwares so it's hard to fix it.
It seems that lge(4) needs the following work.
- endian clean
- bus_dma(9) conversion
- fragment handling as the hardware can't handle more than 10 fragments.
More information about the freebsd-net