6-CURRENT Network stack issues w/SMP? (Was: Re: TreeListfailed:
Network write failure: ChannelMux.ProtocolError)
Andre Guibert de Bruet
andy at siliconlandmark.com
Sun Sep 12 09:25:54 PDT 2004
On Sun, 12 Sep 2004, Robert Watson wrote:
> On Sun, 12 Sep 2004, Andre Guibert de Bruet wrote:
>> On Sun, 12 Sep 2004, Kris Kennaway wrote:
>>> On Sun, Sep 12, 2004 at 02:42:03AM -0400, Andre Guibert de Bruet wrote:
>>>>> I've also noticed data corruption in the form of failed CRCs (And hence
>>>>> dropped SSH connections) while transferring large amounts of data via SSH
>>>>> over gige to a machine on its subnet. These problems started occuring
>>>>> after the giant-less networking megacommit. Older kernels check out
>>>>> without any such issues.
>>> Does it go away if you turn off debug.mpsafenet? If not, it's
>>> probably not related to that commit.
>> Setting debug.mpsafenet to 0 allows the SSH transfers to complete. The
>> MD5 checksums and sizes match. Where do we go from here?
> I think I'd look at the following next:
> - Does your network interface driver support checksum offload? If so,
> what happens if you disable that?
It appears that it does, based on the options field reported by ifconfig:
nge0: flags=108843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
I can still reproduce the problem after passing -rxcsum and -txcsum while
bringing the interface up.
> - Is the network interface driver marked as INTR_MPSAFE and/or not
> IFF_NEEDSGIANT. If either, try setting the driver to run with Giant by
> removing INTR_MPSAFE and adding IFF_NEEDSGIANT.
dev/nge/if_nge.c has the interface marked as IFF_NEEDSGIANT, with no
trace of INTR_MPSAFE. My dmesg confirms this: "nge0: [GIANT-LOCKED]"
> After that I think we want to try and produce a non-SSH reproduction
> scenario using a very simple test program...
Attempting to bring a local FreeBSD repo up-to-date causes the issue to
manifest itself. If portupgrade is run and execs a fetch for a large
tarball from a fast mirror (100KB/s+), the problem manifests itself as
I cannot yet make any conclusive determination, but preliminary pattern
analysis seems to indicate that large bursts of network traffic on this
gige interface aid the reproduction of this condition. The machine in
question acts as a dns resolver for my small home network and appears to
handle light amounts of traffic without any issues.
Thanks for the help,
| Andre Guibert de Bruet | Enterprise Software Consultant >
| Silicon Landmark, LLC. | http://siliconlandmark.com/ >
More information about the freebsd-current