bge data corruption bug (was: 1168 octets payload and bad TCPchecksums)

Kelly Yancey kelly at nttmcl.com
Tue Jan 13 13:36:56 PST 2004


On Fri, 2 Jan 2004, Kelly Yancey wrote:

>
>   We've got Broadcom BCM5701 cards configured for vlan tagging on a
> FreeBSD 4.7 based router; a vlan switch then terminates the trunked
> segment and splits it into separate physical subnets.  It turns out that
> hosts on those segments cannot receive TCP packets with precisely 1168
> octets of payload (ethernet frame size 1222 octets) as the checksum is
> always incorrect.  We've manually backported all of the bge driver updates
> from 4-stable, but to no avail.
>   What is particularly odd is that the checksums are always wrong by the
> same amount: 0xAC48 (the dump below only shows retries of the same
> packet, but the difference is the same even for other packets).
> Furthermore, it appears only TCP packets with 1168 octets of data are
> affected.  I cannot easily create an environment without the vlans to
> determine whether or not tagging is related.  Note also, that the IP
> checksum is correct.
>

  First, once slight clarification to my original posting: the received
from, after vlan untagging is 1222 octets; the sent frame includes a tag
so it is 1226 octets.

  Anyway, it appears that the cause of the bad checksums are that the last
dword of the transmitted frame is getting corrupted in hardware.  I've
attached tcpdumps taken from both the sender and the client receiving the
bogus frames.  Viewed with:
  tcpdump -n -vvv -s 2000 -Xx -r dump.server and
  tcpdump -n -vvv -s 2000 -Xx -r dump.client respectively, you can see
that the last 4 octets are different than what the server sent, hence the
checksum is invalid.  I've tested with hardware checksumming disabled on
transmit with no difference.

  As I mentioned in my original e-mail, while we are using 4.7p23, we are
using the bge driver from 4.9:

[13:15:24] kbyanc:/usr/src/sys/dev/bge $ ident *
if_bge.c:
     $FreeBSD: src/sys/dev/bge/if_bge.c,v 1.3.2.28 2003/09/26 16:02:04 ps Exp $
     $FreeBSD: src/sys/dev/bge/if_bge.c,v 1.3.2.28 2003/09/26 16:02:04 ps Exp $

if_bgereg.h:
     $FreeBSD: src/sys/dev/bge/if_bgereg.h,v 1.1.2.12 2003/08/17 19:15:10 ps Exp $

[13:26:29] kbyanc:/usr/src/sys/dev/mii $ ident brgphy*
brgphy.c:
     $FreeBSD: src/sys/dev/mii/brgphy.c,v 1.1.2.8 2003/07/22 02:12:55 ps Exp $
     $FreeBSD: src/sys/dev/mii/brgphy.c,v 1.1.2.8 2003/07/22 02:12:55 ps Exp $

brgphyreg.h:
     $FreeBSD: src/sys/dev/mii/brgphyreg.h,v 1.1.2.3 2003/06/17 16:07:31 ps Exp $

  So far, we have only been able to reproduce the problem with TCP packets
with 1168 octets of payload, using vlan tagging on the bge interface.  To
reproduce the problem, configure a vlan with a NIC with the Broadcom 5701
chipset as the parent interface, connect the NIC to a vlan switch to
remove the tag.  One site that we have found which pretty consistently
serves content of the error-generating size is http://www.vanguard.com/
and click "log on to your accounts".  The page should never load because
the HTTP reply (and all retransmissions, of course) have a bad checksum.

  If anyone has any theory as to the cause, I would appreciate hearing it
as it would save me a lot of debugging hours.  Thanks,

  Kelly
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dump.client
Type: application/octet-stream
Size: 33450 bytes
Desc: 
Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20040113/49ed1af6/dump.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dump.server
Type: application/octet-stream
Size: 34800 bytes
Desc: 
Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20040113/49ed1af6/dump-0001.obj


More information about the freebsd-net mailing list