sshd / tcp packet corruption ? ZFS & Samba?

Martin Minkus martin.minkus at punz.co.nz
Tue Jun 29 03:42:22 UTC 2010


Okay guys,

 

Just thought i’d post that a resolution has been found.

 

People suggested it could be hardware and try memtest – which never
found anything.

 

It seems though that in the end the issue is the motherboard; Possibly
the southbridge or something to do with the PCI bus.

 

The SATA drives which are hanging of a marvel in a pcie slot was
unaffected. No amount of zfs scrubs and rsync with checksumming found
anything wrong.

 

It was only network traffic on the intel pro (pci card) or onboard
nvidia nfe card that had issues. It was worst when using samba of ZFS,
though god knows why that exposed the issue more.

 

I never had any kernel panics, just silent data corruption on the PCI
bus.

 

Moved hdds and cards to a different motherboard, and everything is 100%
fine.

 

So a couple weeks looking at this on and off (and slowly losing my mind)
and it was nothing more than flaky hardware.

 

Thanks for your help to those who took the time to reply.

 

Martin.

 

From: Martin Minkus 
Sent: Monday, 28 June 2010 09:22
To: freebsd-questions at freebsd.org
Subject: RE: sshd / tcp packet corruption ? ZFS & Samba?

 

Hey all,

 

It was suggested I do a memtest, but that checked out fine. (I wish it
was as simple as just the ram!)

 

I’ve realised the issue manifests itself almost immediately when
accessing an underlying ZFS filesystem using Samba. But if it is UFS, it
is fine.

 

Does this mean anything to anyone?

 

Ie: md5’ing the same file over SMB, one on UFS (/tmp) one on ZFS:

 

cd5d0011c28fb335d57a83b3751831e7
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

bb433ae7e4c3c70c49b3c8c1590e8aa5
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

8eeaf672f6742ae4f900b16ec3cb190a
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

bc327dc715516b5ba2e8478036112bd2
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

0cde0cf7ec036cedc8f3294153209b4c
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

71e705470a4af5533eb019e00df3a946
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

ba7041e4cad852d00c8da1a461e3b5f9
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

7ce9ea8b9a4d8858899da23472a24c76
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

8f0eff7cb6069ff39aa46e2affc27a4b
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

c23fceb0302fd59b49e22bce61eabe8d
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

46c9d538c99be3947b92f9ec47bb900a
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

2a2a94c94a167a8e525e368aceb07875
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

d303861d09b0584f6c6621e9881e3f63
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

ad8f8cef1829de206460b947687909f0
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

9a866d9602a9df92b6acb6f1182b05ab
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

5552491a9e295890ad48064440d8d05b
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

ceee04c26b03132db48d67c076526c82
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

7aa666918d73e40a25ccdb1c104f8476
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

561aa772884c0b7ef139f556355adffb
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

30540ecb4bfb8533969f4a4137a77e79
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

c0f315f00be76a4e15dec68de2bba49b
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

9de4864a97ed4ad9c495c221fe1b932f
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

47c8ad183dbe0d4637229af08cc2cd89
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

c9bfe8c7073940acbcdb31430eb4a061
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

327605a6ddb89f7a3e2bd056c5f28b2a
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

6008526a44790297110f4361fe1a5292
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

3f6444cf9b7482df5b6aee577906821c
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

23a3fea1c1c79df4cdc30544f2af1b2d
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

1591ac3f2e730a1a47792241bb708a1c
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

fa7c62b330717a66b5442c7df2bdce3e
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

996cbec57e67a14f69bb288e43eb81b2
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

074fe31d93ed0ccf42867bfe34502c1a
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

4d69eb69423fd8e373978c068003021c
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

82cc83d5af8f0217f8d196882ddf5d90
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

b610773b74ec85511548dfe6d3d12b74
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

a4a694f353175ef774a25a92bc35badc
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

c7a23899df5987bd65a8c7e0cf0dfcd3
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

5a4fdf3f3d74562eec83491236a168a4
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

65fbd57a0ebaa3e94ab78ea3b3ec8497
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

476208d260b18c724e77e43fe79c6960
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

3880b71d78a22422b8299c66f7192cb0
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

f2a1540d9833f1faab312026164d271d
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

cde5e5dcf53e0eb93e6af64b70e7961f
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

d584260380b2800e85dc2c877534378a
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

77d36239bd1728219196461f57d2b859
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

4fbad72dde8d79e6103dac67fad852be
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

91f59e575e6cca8f402e228d8a72ad1a
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

9ddac79b29819dbe88dd7583ee6df4b9
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

1264d7634c329125bf87d9d9ab40a128
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

00e004a4a491377b965c8bc5515a9e6f
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

774ec65d5a04f4482bb99a8c05aebff7
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

ea36f8719932894229911a5a958c778b
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

113d3573f119dedf2a09c27e52957a5d
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

0c35c60ab988e140d0a6ff8e52027576
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

536a6b09753c661f7029a0bb983a6e93
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

53a16a6860544c3dba85ca46f97c1865
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

4c5a453abc1c72da148bd1a5e4addafa
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

61a1920af7e0250eecc91afeb79e3ce9
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

774b7dd4b9cc894dda000a572e7dfed5
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

11745efe136291bed3f3db64e12449b5
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

d4915a8b84fd35650d0aaf537119977a
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

62f386bce4f1193a1aca73283164d6f9
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

4664c7b894ffd928c55cc9089b64bdf3
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

3973b22753d411d3bf736537fc20ae20
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

a4b96ffd965667fdd2d73d44afcfbdcb
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

eb2bb2e51a7439e1ed3d17b1280cf760
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

21fc6cff11e9d8e22595b6af19d69e67
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

b88eab9b32e58c072258a38b037a9a25
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

2730294929de8e517c83041cb7233291
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

cddbe019b780dcf056525c48703d7b1e
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

2652bb1a766d002e633a73d10321edfe
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

52136867ac3d477e9643aa15a2c0e957
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

 

The //kinetic/temp/ share is on UFS (it is /tmp) while
//kinetic/pulse/temp is on the /pulse ZFS pool.

 

For the record, locally on kinetic:

 

kinetic:~# md5 /tmp/Desktop.exe

MD5 (/tmp/Desktop.exe) = 2447bdb56c5fa8efa761ffa100908022

kinetic:~# md5 /pulse/shares/cti/bin/Desktop.exe 

MD5 (/pulse/shares/cti/bin/Desktop.exe) =
2447bdb56c5fa8efa761ffa100908022

kinetic:~#

 

So accessing the filesystem local is okay? It is only a samba off of ZFS
issue?

 

Following that, eventually (a few days time) network traffic in general
will start to be corrupted (hence ssh connections drop out, the netcat
sessions below, etc).

 

I’ve tried testing a generic kernel and without zfs, and everything is
fine. It is only once ZFS is loaded into the kernel and we try to access
it using samba does this happen.

 

Smb.conf:

 

#======================= Global Settings
=====================================

[global]

 workgroup = PULSE

 server string = Kinetic ZFS Fileserver

 netbios name = KINETIC

 security = user

 load printers = no

 log file = /var/log/samba/log.%m

#log level = 10

 max log size = 50

 encrypt passwords = yes

 

#smb ports = 139

socket options = TCP_NODELAY SO_SNDBUF=65536 SO_RCVBUF=65536

#socket options = TCP_NODELAY SO_SNDBUF=8192 SO_RCVBUF=8192

 read raw = yes

 use sendfile = yes

directory name cache size = 0

 

 preserve case = yes

 short preserve case = yes

 case sensitive = no

 

 guest account = nobody

 

 wins support = yes

#passdb backend = ldapsam:"ldap://gold.pulse.local"

 passdb backend = ldapsam:"ldap://kinetic.pulse.local
ldap://gold.pulse.local"

 ldap ssl = no

 ldap admin dn = cn=Manager,dc=pulse,dc=local

 ldap suffix = dc=pulse,dc=local

 ldap group suffix = ou=Groups

 ldap user suffix = ou=Users

 ldap machine suffix = ou=Computers

 

#nt acl support = yes

#acl compatibility = auto

#acl group control = yes

#acl map full control = true

 

#============================ Share Definitions
==============================

 

[Temp]

 comment = Temp Space

 guest ok = yes

 browseable = Yes

 path = /tmp

 

etc...

 

dmesg:

 

Copyright (c) 1992-2010 The FreeBSD Project.

Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994

        The Regents of the University of California. All rights
reserved.

FreeBSD is a registered trademark of The FreeBSD Foundation.

FreeBSD 8.1-RC1 #4: Thu Jun 24 16:09:27 NZST 2010

    martinm at kinetic.pulse.local:/usr/obj/usr/src/sys/PULSE amd64

Timecounter "i8254" frequency 1193182 Hz quality 0

CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ (2712.36-MHz
K8-class CPU)

  Origin = "AuthenticAMD"  Id = 0x60fb2  Family = f  Model = 6b
Stepping = 2

 
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE
,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>

  Features2=0x2001<SSE3,CX16>

  AMD
Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>

  AMD Features2=0x11f<LAHF,CMP,SVM,ExtAPIC,CR8,Prefetch>

  TSC: P-state invariant

real memory  = 4294967296 (4096 MB)

avail memory = 4044939264 (3857 MB)

ACPI APIC Table: <GBT    NVDAACPI>

FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs

FreeBSD/SMP: 1 package(s) x 2 core(s)

 cpu0 (BSP): APIC ID:  0

 cpu1 (AP): APIC ID:  1

ioapic0: Changing APIC ID to 2

ioapic0 <Version 1.1> irqs 0-23 on motherboard

kbd1 at kbdmux0

acpi0: <GBT NVDAACPI> on motherboard

acpi0: [ITHREAD]

acpi0: Power Button (fixed)

acpi0: reservation of 0, a0000 (3) failed

acpi0: reservation of 100000, cbdf0000 (3) failed

Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000

acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0

cpu0: <ACPI CPU> on acpi0

cpu1: <ACPI CPU> on acpi0

acpi_hpet0: <High Precision Event Timer> iomem 0xfeff0000-0xfeff03ff on
acpi0

Timecounter "HPET" frequency 25000000 Hz quality 900

acpi_button0: <Power Button> on acpi0

pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0

pci0: <ACPI PCI bus> on pcib0

pci0: <memory, RAM> at device 0.0 (no driver attached)

isab0: <PCI-ISA bridge> at device 1.0 on pci0

isa0: <ISA bus> on isab0

pci0: <serial bus, SMBus> at device 1.1 (no driver attached)

pci0: <memory, RAM> at device 1.2 (no driver attached)

ohci0: <nVidia nForce MCP61 USB Controller> mem 0xfe02f000-0xfe02ffff
irq 21 at device 2.0 on pci0

ohci0: [ITHREAD]

usbus0: <nVidia nForce MCP61 USB Controller> on ohci0

ehci0: <NVIDIA nForce MCP61 USB 2.0 controller> mem
0xfe02e000-0xfe02e0ff irq 22 at device 2.1 on pci0

ehci0: [ITHREAD]

usbus1: EHCI version 1.0

usbus1: <NVIDIA nForce MCP61 USB 2.0 controller> on ehci0

pcib1: <ACPI PCI-PCI bridge> at device 4.0 on pci0

pci1: <ACPI PCI bus> on pcib1

em0: <Intel(R) PRO/1000 Legacy Network Connection 1.0.1> port
0xcc00-0xcc3f mem 0xfdfe0000-0xfdffffff,0xfdfc0000-0xfdfdffff irq 17 at
device 7.0 on pci1

em0: [FILTER]

em0: Ethernet address: 00:0e:0c:6b:d6:d3

pci0: <multimedia, HDA> at device 5.0 (no driver attached)

atapci0: <nVidia nForce MCP61 UDMA133 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 6.0 on pci0

ata0: <ATA channel 0> on atapci0

ata0: [ITHREAD]

ata1: <ATA channel 1> on atapci0

ata1: [ITHREAD]

nfe0: <NVIDIA nForce MCP61 Networking Adapter> port 0xec00-0xec07 mem
0xfe02d000-0xfe02dfff irq 20 at device 7.0 on pci0

miibus0: <MII bus> on nfe0

rlphy0: <RTL8201L 10/100 media interface> PHY 1 on miibus0

rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

nfe0: Ethernet address: 00:24:1d:15:11:48

nfe0: [FILTER]

nfe0: [FILTER]

nfe0: [FILTER]

nfe0: [FILTER]

nfe0: [FILTER]

nfe0: [FILTER]

nfe0: [FILTER]

nfe0: [FILTER]

atapci1: <nVidia nForce MCP61 SATA300 controller> port
0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xd800-0xd80f mem
0xfe02c000-0xfe02cfff irq 21 at device 8.0 on pci0

atapci1: [ITHREAD]

ata2: <ATA channel 0> on atapci1

ata2: [ITHREAD]

ata3: <ATA channel 1> on atapci1

ata3: [ITHREAD]

pcib2: <ACPI PCI-PCI bridge> at device 9.0 on pci0

pci2: <ACPI PCI bus> on pcib2

mvs0: <Marvell 88SX7042 SATA controller> port 0xbc00-0xbcff mem
0xfde00000-0xfdefffff irq 16 at device 0.0 on pci2

mvs0: Gen-IIe, 4 3Gbps ports, Port Multiplier supported with FBS

mvs0: [ITHREAD]

mvsch0: <Marvell SATA channel> at channel 0 on mvs0

mvsch0: [ITHREAD]

mvsch1: <Marvell SATA channel> at channel 1 on mvs0

mvsch1: [ITHREAD]

mvsch2: <Marvell SATA channel> at channel 2 on mvs0

mvsch2: [ITHREAD]

mvsch3: <Marvell SATA channel> at channel 3 on mvs0

mvsch3: [ITHREAD]

vgapci0: <VGA-compatible display> mem
0xfb000000-0xfbffffff,0xd0000000-0xdfffffff,0xfc000000-0xfcffffff irq 22
at device 13.0 on pci0

atrtc0: <AT realtime clock> port 0x70-0x73 on acpi0

uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0

uart0: [FILTER]

ppc0: <Parallel port> port 0x378-0x37f irq 7 on acpi0

ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode

ppc0: [ITHREAD]

ppbus0: <Parallel port bus> on ppc0

plip0: <PLIP network interface> on ppbus0

plip0: [ITHREAD]

lpt0: <Printer> on ppbus0

lpt0: [ITHREAD]

lpt0: Interrupt-driven port

ppi0: <Parallel I/O> on ppbus0

orm0: <ISA Option ROMs> at iomem 0xd0000-0xd3fff,0xdb000-0xdbfff on isa0

sc0: <System console> at flags 0x100 on isa0

sc0: VGA <16 virtual consoles, flags=0x300>

vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on
isa0

atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0

atkbd0: <AT Keyboard> irq 1 on atkbdc0

kbd0 at atkbd0

atkbd0: [GIANT-LOCKED]

atkbd0: [ITHREAD]

acpi_throttle0: <ACPI CPU Throttling> on cpu0

powernow0: <PowerNow! K8> on cpu0

device_attach: powernow0 attach returned 6

acpi_throttle1: <ACPI CPU Throttling> on cpu1

acpi_throttle1: failed to attach P_CNT

device_attach: acpi_throttle1 attach returned 6

powernow1: <PowerNow! K8> on cpu1

device_attach: powernow1 attach returned 6

Timecounters tick every 1.000 msec

usbus0: 12Mbps Full Speed USB v1.0

usbus1: 480Mbps High Speed USB v2.0

acd0: DVDR <HL-DT-STDVD-RAM GH22NP20/1.02> at ata0-slave UDMA66 

ad4: 76319MB <WDC WD800JD-60LSA0 07.01D07> at ata2-master UDMA100 SATA
3Gb/s

ugen0.1: <nVidia> at usbus0

uhub0: <nVidia OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on
usbus0

ugen1.1: <nVidia> at usbus1

uhub1: <nVidia EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on
usbus1

uhub0: 10 ports with 10 removable, self powered

ada0 at mvsch0 bus 0 scbus0 target 0 lun 0

ada0: <GB0500C4413 HPG1> ATA-7 SATA 1.x device

ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)

ada0: Command Queueing enabled

ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)

ada1 at mvsch1 bus 0 scbus1 target 0 lun 0

ada1: <GB0500C4413 HPG1> ATA-7 SATA 1.x device

ada1: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)

ada1: Command Queueing enabled

ada1: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)

ada2 at mvsch2 bus 0 scbus2 target 0 lun 0

ada2: <GB0500C4413 HPG3> ATA-7 SATA 1.x device

ada2: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)

ada2: Command Queueing enabled

ada2: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)

ada3 at mvsch3 bus 0 scbus3 target 0 lun 0

ada3: <GB0500C4413 HPG1> ATA-7 SATA 1.x device

ada3: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)

ada3: Command Queueing enabled

ada3: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)

SMP: AP CPU #1 Launched!

Root mount waiting for: usbus1

Root mount waiting for: usbus1

Root mount waiting for: usbus1

uhub1: 10 ports with 10 removable, self powered

Trying to mount root from ufs:/dev/ad4s1a

ugen0.2: <CHICONY> at usbus0

ukbd0: <CHICONY Compaq USB Keyboard, class 0/0, rev 1.10/1.05, addr 2>
on usbus0

kbd2 at ukbd0

uhid0: <CHICONY Compaq USB Keyboard, class 0/0, rev 1.10/1.05, addr 2>
on usbus0

em0: link state changed to UP

ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is
present;

            to enable, add "vfs.zfs.prefetch_disable=0" to
/boot/loader.conf.

ZFS filesystem version 3

ZFS storage pool version 14

kinetic:~#

 

 

I’ve since removed everything from /etc/sysctl.conf and/boot/loader.conf
so no tuning is used. I’ve also been fiddling and trying all sorts of
different things in smb.conf.

 

It makes no difference.

 

I am at a complete loss as to what is going on here.

 

Should I just give up? Is there some obscure ZFS+Samba issue on FreeBSD?

 

Thanks,

Martin.

 

 

From: Martin Minkus 
Sent: Wednesday, 23 June 2010 16:01
To: freebsd-questions at freebsd.org
Subject: sshd / tcp packet corruption ?

 

It seems this issue I reported below may actually be related to some
kind of TCP packet corruption ?

 

Still same box. I’ve noticed my SSH connections into the box will die
randomly, with errors.

 

Sshd logs the following on the box itself:

 

Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:15:41 kinetic sshd[15746]: Accepted publickey for martinm from
10.64.10.251 port 56469 ssh2

Jun 18 11:15:58 kinetic su: nss_ldap: could not get LDAP result - Can't
contact LDAP server

Jun 18 11:15:58 kinetic su: martinm to root on /dev/pts/0

Jun 18 11:16:06 kinetic su: martinm to root on /dev/pts/1

Jun 18 11:16:29 kinetic sshd[15748]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:16:30 kinetic sshd[15746]: syslogin_perform_logout: logout()
returned an error

Jun 18 11:16:34 kinetic sshd[16511]: Accepted publickey for martinm from
10.64.10.251 port 56470 ssh2

Jun 18 11:16:41 kinetic sshd[16513]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:16:41 kinetic sshd[16511]: syslogin_perform_logout: logout()
returned an error

 

Jun 23 15:52:59 kinetic sshd[56974]: Received disconnect from
10.64.10.209: 5: Message Authentication Code did not verify (packet
#75658). Data integrity has been compromised. 

Jun 23 15:53:12 kinetic sshd[57109]: Accepted publickey for martinm from
10.64.10.209 port 9494 ssh2

Jun 23 15:53:38 kinetic su: martinm to root on /dev/pts/3

Jun 23 15:56:36 kinetic sshd[57111]: Received disconnect from
10.64.10.209: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 23 15:56:44 kinetic sshd[57151]: Accepted publickey for martinm from
10.64.10.209 port 9534 ssh2

 

My googlefu has failed me on this.

 

Any ideas what on earth this could be ?

 

Ethernet card?

 

em0: <Intel(R) PRO/1000 Legacy Network Connection 1.0.1> port
0xcc00-0xcc3f mem 0xfdfe0000-0xfdffffff,0xfdfc0000-0xfdfdffff irq 17 at
device 7.0 on pci1

em0: [FILTER]

em0: Ethernet address: 00:0e:0c:6b:d6:d3

 

em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
1500

 
options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC
>

        ether 00:0e:0c:6b:d6:d3

        inet 10.64.10.10 netmask 0xffffff00 broadcast 10.64.10.255

        media: Ethernet autoselect (1000baseT <full-duplex>)

        status: active

 

Thanks,

Martin.

 

 

From: Martin Minkus 
Sent: Monday, 14 June 2010 11:21
To: freebsd-questions at freebsd.org
Subject: FreeBSD+ZFS+Samba: open_socket_in: Protocol not supported -
after a few days?

 

Samba 3.4 on FreeBSD 8-STABLE branch.

After a few days I start getting weird errors and windows PC's can't
access the samba share, have trouble accessing files, etc, and samba
becomes totally unusable.

Restarting samba doesn't fix it – only a reboot does.

 

Accessing files on the ZFS pool locally is fine. Other services (like
dhcpd, openldap server) on the box continue to work fine. Only samba
dies and by dies I mean it can no longer service clients and windows
brings up bizarre errors. Windows can access our other samba servers (on
linux, etc) just fine.

Kernel:

 

FreeBSD kinetic.pulse.local 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #4:
Wed May 26 18:09:14 NZST 2010
martinm at kinetic.pulse.local:/usr/obj/usr/src/sys/PULSE amd64

 

Zpool status:

 

kinetic:~$ zpool status

  pool: pulse

 state: ONLINE

 scrub: none requested

config:

 

        NAME                                          STATE     READ
WRITE CKSUM

        pulse                                         ONLINE       0    
0     0

          raidz1                                      ONLINE       0    
0     0

            gptid/3baa4ef3-3ef8-0ac0-f110-f61ea23352  ONLINE       0    
0     0

            gptid/0eaa8131-828e-6449-b9ba-89ac63729d  ONLINE       0    
0     0

            gptid/77a8da7c-8e3c-184c-9893-e0b12b2c60  ONLINE       0    
0     0

            gptid/dddb2b48-a498-c1cd-82f2-a2d2feea01  ONLINE       0    
0     0

 

errors: No known data errors

kinetic:~$


log.smb:

[2010/06/10 17:22:39, 0] lib/util_sock.c:902(open_socket_in)
open_socket_in(): socket() call failed: Protocol not supported
[2010/06/10 17:22:39, 0] smbd/server.c:457(smbd_open_one_socket)
smbd_open_once_socket: open_socket_in: Protocol not supported
[2010/06/10 17:22:39, 2] smbd/server.c:676(smbd_parent_loop)
waiting for connections

log.ANYPC:

[2010/06/08 19:55:55, 0] lib/util_sock.c:1491(get_peer_addr_internal)
getpeername failed. Error was Socket is not connected
read_fd_with_timeout: client 0.0.0.0 read error = Socket is not
connected.


The code in lib/util_sock.c, around line 902:

/***********************************************************************
*****
Open a socket of the specified type, port, and address for incoming
data.
************************************************************************
****/

int open_socket_in(int type,
uint16_t port,
int dlevel,
const struct sockaddr_storage *psock,
bool rebind)
{
struct sockaddr_storage sock;
int res;
socklen_t slen = sizeof(struct sockaddr_in);

sock = *psock;

#if defined(HAVE_IPV6)
if (sock.ss_family == AF_INET6) {
((struct sockaddr_in6 *)&sock)->sin6_port = htons(port);
slen = sizeof(struct sockaddr_in6);
}
#endif
if (sock.ss_family == AF_INET) {
((struct sockaddr_in *)&sock)->sin_port = htons(port);
}

res = socket(sock.ss_family, type, 0 );
if( res == -1 ) {
if( DEBUGLVL(0) ) {
dbgtext( "open_socket_in(): socket() call failed: " );
dbgtext( "%s\n", strerror( errno ) );
}

In other words, it looks like something in the kernel is exhausted
(what?). I don’t know if tuning is required, or this is some kind of
bug?

/boot/loader.conf:

mvs_load="YES"
zfs_load="YES"
vm.kmem_size="20G"

#vfs.zfs.arc_min="512M"
#vfs.zfs.arc_max="1536M"

vfs.zfs.arc_min="512M"
vfs.zfs.arc_max="3072M"

I’ve played with a few sysctl settings (found these recommendations
online, but they make no difference)


/etc/sysctl.conf:

kern.ipc.maxsockbuf=2097152

net.inet.tcp.sendspace=262144
net.inet.tcp.recvspace=262144
net.inet.tcp.mssdflt=1452

net.inet.udp.recvspace=65535
net.inet.udp.maxdgram=65535

net.local.stream.recvspace=65535
net.local.stream.sendspace=65535

Any ideas on what could possibly be going wrong?

 

Any help would be greatly appreciated!

 

Thanks,

Martin




More information about the freebsd-questions mailing list