sshd / tcp packet corruption ?

Wed Jun 23 04:01:19 UTC 2010

It seems this issue I reported below may actually be related to some
kind of TCP packet corruption ?

Still same box. I’ve noticed my SSH connections into the box will die
randomly, with errors.

Sshd logs the following on the box itself:

Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:15:41 kinetic sshd[15746]: Accepted publickey for martinm from
10.64.10.251 port 56469 ssh2

Jun 18 11:15:58 kinetic su: nss_ldap: could not get LDAP result - Can't
contact LDAP server

Jun 18 11:15:58 kinetic su: martinm to root on /dev/pts/0

Jun 18 11:16:06 kinetic su: martinm to root on /dev/pts/1

Jun 18 11:16:29 kinetic sshd[15748]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:16:30 kinetic sshd[15746]: syslogin_perform_logout: logout()
returned an error

Jun 18 11:16:34 kinetic sshd[16511]: Accepted publickey for martinm from
10.64.10.251 port 56470 ssh2

Jun 18 11:16:41 kinetic sshd[16513]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:16:41 kinetic sshd[16511]: syslogin_perform_logout: logout()
returned an error

Jun 23 15:52:59 kinetic sshd[56974]: Received disconnect from
10.64.10.209: 5: Message Authentication Code did not verify (packet
#75658). Data integrity has been compromised. 

Jun 23 15:53:12 kinetic sshd[57109]: Accepted publickey for martinm from
10.64.10.209 port 9494 ssh2

Jun 23 15:53:38 kinetic su: martinm to root on /dev/pts/3

Jun 23 15:56:36 kinetic sshd[57111]: Received disconnect from
10.64.10.209: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 23 15:56:44 kinetic sshd[57151]: Accepted publickey for martinm from
10.64.10.209 port 9534 ssh2

My googlefu has failed me on this.

Any ideas what on earth this could be ?

Ethernet card?

em0: <Intel(R) PRO/1000 Legacy Network Connection 1.0.1> port
0xcc00-0xcc3f mem 0xfdfe0000-0xfdffffff,0xfdfc0000-0xfdfdffff irq 17 at
device 7.0 on pci1

em0: [FILTER]

em0: Ethernet address: 00:0e:0c:6b:d6:d3

em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
1500

options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC
>

        ether 00:0e:0c:6b:d6:d3

        inet 10.64.10.10 netmask 0xffffff00 broadcast 10.64.10.255

        media: Ethernet autoselect (1000baseT <full-duplex>)

        status: active

Thanks,

Martin.

From: Martin Minkus 
Sent: Monday, 14 June 2010 11:21
To: freebsd-questions at freebsd.org
Subject: FreeBSD+ZFS+Samba: open_socket_in: Protocol not supported -
after a few days?

Samba 3.4 on FreeBSD 8-STABLE branch.

After a few days I start getting weird errors and windows PC's can't
access the samba share, have trouble accessing files, etc, and samba
becomes totally unusable.

Restarting samba doesn't fix it – only a reboot does.

Accessing files on the ZFS pool locally is fine. Other services (like
dhcpd, openldap server) on the box continue to work fine. Only samba
dies and by dies I mean it can no longer service clients and windows
brings up bizarre errors. Windows can access our other samba servers (on
linux, etc) just fine.

Kernel:

FreeBSD kinetic.pulse.local 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #4:
Wed May 26 18:09:14 NZST 2010
martinm at kinetic.pulse.local:/usr/obj/usr/src/sys/PULSE amd64

Zpool status:

kinetic:~$ zpool status

  pool: pulse

 state: ONLINE

 scrub: none requested

config:

        NAME                                          STATE     READ
WRITE CKSUM

        pulse                                         ONLINE       0    
0     0

          raidz1                                      ONLINE       0    
0     0

            gptid/3baa4ef3-3ef8-0ac0-f110-f61ea23352  ONLINE       0    
0     0

            gptid/0eaa8131-828e-6449-b9ba-89ac63729d  ONLINE       0    
0     0

            gptid/77a8da7c-8e3c-184c-9893-e0b12b2c60  ONLINE       0    
0     0

            gptid/dddb2b48-a498-c1cd-82f2-a2d2feea01  ONLINE       0    
0     0

errors: No known data errors

kinetic:~$

log.smb:

[2010/06/10 17:22:39, 0] lib/util_sock.c:902(open_socket_in)
open_socket_in(): socket() call failed: Protocol not supported
[2010/06/10 17:22:39, 0] smbd/server.c:457(smbd_open_one_socket)
smbd_open_once_socket: open_socket_in: Protocol not supported
[2010/06/10 17:22:39, 2] smbd/server.c:676(smbd_parent_loop)
waiting for connections

log.ANYPC:

[2010/06/08 19:55:55, 0] lib/util_sock.c:1491(get_peer_addr_internal)
getpeername failed. Error was Socket is not connected
read_fd_with_timeout: client 0.0.0.0 read error = Socket is not
connected.

The code in lib/util_sock.c, around line 902:

/***********************************************************************
*****
Open a socket of the specified type, port, and address for incoming
data.
************************************************************************
****/

int open_socket_in(int type,
uint16_t port,
int dlevel,
const struct sockaddr_storage *psock,
bool rebind)
{
struct sockaddr_storage sock;
int res;
socklen_t slen = sizeof(struct sockaddr_in);

sock = *psock;

#if defined(HAVE_IPV6)
if (sock.ss_family == AF_INET6) {
((struct sockaddr_in6 *)&sock)->sin6_port = htons(port);
slen = sizeof(struct sockaddr_in6);
}
#endif
if (sock.ss_family == AF_INET) {
((struct sockaddr_in *)&sock)->sin_port = htons(port);
}

res = socket(sock.ss_family, type, 0 );
if( res == -1 ) {
if( DEBUGLVL(0) ) {
dbgtext( "open_socket_in(): socket() call failed: " );
dbgtext( "%s\n", strerror( errno ) );
}

In other words, it looks like something in the kernel is exhausted
(what?). I don’t know if tuning is required, or this is some kind of
bug?

/boot/loader.conf:

mvs_load="YES"
zfs_load="YES"
vm.kmem_size="20G"

#vfs.zfs.arc_min="512M"
#vfs.zfs.arc_max="1536M"

vfs.zfs.arc_min="512M"
vfs.zfs.arc_max="3072M"

I’ve played with a few sysctl settings (found these recommendations
online, but they make no difference)

/etc/sysctl.conf:

kern.ipc.maxsockbuf=2097152

net.inet.tcp.sendspace=262144
net.inet.tcp.recvspace=262144
net.inet.tcp.mssdflt=1452

net.inet.udp.recvspace=65535
net.inet.udp.maxdgram=65535

net.local.stream.recvspace=65535
net.local.stream.sendspace=65535

Any ideas on what could possibly be going wrong?

Any help would be greatly appreciated!

Thanks,

Martin