broken ip checksum after frag reassemble of nfs READDIR?

Sun Apr 2 19:33:50 UTC 2006

On Sun, Apr 02, 2006 at 11:56:09AM -0400, Adam McDougall wrote:

  On Sun, Apr 02, 2006 at 05:34:00PM +0200, Max Laier wrote:

    On Sunday 02 April 2006 07:45, Adam McDougall wrote:
    > I have been using 'ls' on a directory to test my ruleset and effects
    > of scrubbing rules.  My latest discovery is if I use 'scrub .... fragment
    > reassemble',  the packet on the outgoing interface will have a wildly
    > incorrect IP checksum (ethereal says 0x7b49 should be 0x688d for example).

    Is this ethereal on the sending or on the receiving side?  Note that with 
    hardware checksums (as em(4) usually does) you will see corrupted checksums 
    in ethereal as it is computed by the hardware later on.  Please verify that 
    you are seeing corruption on the receiving side or turn off the hardware 
    checksum calculation (ifconfig em0 -txcsum)

  I have been using tcpdump -w to write to a file then I scp it to another
  machine to inspect using ethereal.  em0 shows the incoming fragments with
  correct checksums for each.  em1 shows the outgoing reassembled packet with
  a bad checksum.  I tried -txcsum and -rxcsum on em1 last night but it 
  made no difference, but it seems like OFF is the default now according to
  ifconfig, the only change I saw was when I did ifconfig em1 txcsum rxcsum.
  I will post a copy of that soon.  I am using 6-STABLE from Mar 23 2006.

    > I am using pf over a bridge with two 'em' interfaces, and encountered
    > other code paths in the recent past in pf_norm.c that did not recalculate
    > the checksum for changes it made, but in essence I think this time pf is
    > generating this packet as a reassembly of 5 fragments (total size 6296)
    > and doesn't seem to be applying a correct ip header checksum.  The
    > header checksum is not even similar to the checksum of the last fragment
    > when entering the firewall (0xbfa4).  Right now, I increased the outgoing
    > em1 interface to mtu 8000 just so the outgoing nic will not get wedged in
    > OACTIVE with 100% reproducability (more on that later).

    Can you give us a more detailed overview of your scenario and testcase?  I am 
    not quite sure what you are trying to do and how it fails.  Also, which type 
    of bridge are you using?

  I will follow up with this.  I am using if_bridge.  

    > Can someone take a look and help me out, or let me know how I can help?
    > Thanks.

NFS Server:
------------
NetApp v.7.0.3  10.0.37.112

Firewall:
---------
FreeBSD 6.1 Mar 23 if_bridge firewall with pf, pf_norm.c patched manually to
rev 1.11.2.4, ext_if=em0 (10.0.44.100), int_if=em1.  MTU is temporarily 8000
to avoid wedging it with a large reassembled packet from pf (6296 bytes)

bridge0: flags=8043<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        ether ac:de:48:10:28:7b
        priority 32768 hellotime 2 fwddelay 15 maxage 20
        member: em1 flags=3<LEARNING,DISCOVER>
        member: em0 flags=3<LEARNING,DISCOVER>
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        options=8<VLAN_MTU>
        inet 10.0.44.100 netmask 0xffffff00 broadcast 10.0.44.255
        ether 00:40:d0:43:dd:d4
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 8000
        options=8<VLAN_MTU>
        ether 00:40:d0:43:dd:d5
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active

# ifconfig em1 -rxcsum -txcsum
# ifconfig em1
em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 8000
        options=8<VLAN_MTU>
        inet 10.0.1.80 netmask 0xffffff00 broadcast 10.0.1.255
        ether 00:40:d0:43:dd:d5
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active

# pfctl -sr
No ALTQ support in kernel
ALTQ related functions disabled
scrub in on em0 all fragment reassemble
scrub out on em0 all fragment reassemble
pass quick on lo0 all
pass quick on em1 all
block drop log-all on em0 all
pass in quick on em0 inet proto tcp from any to 10.0.44.100 port = ssh flags S/SA keep state 
(if-bound)
pass in quick on em0 inet proto tcp from any to 10.0.44.18 port = ssh flags S/SA keep state 
(if-bound)
pass in quick on em0 inet proto tcp from any to 10.0.44.18 port = 3128 flags S/SA keep state 
(if-bound)
pass in quick on em0 inet proto udp from 10.0.37.112 to any port = sunrpc keep state (if-bound)
pass in quick on em0 inet proto udp from 10.0.37.112 to any port = nfsd keep state (if-bound)
pass in quick on em0 inet proto udp from 10.0.37.112 port = sunrpc to any
pass in quick on em0 inet proto udp from 10.0.37.112 port = nfsd to any
pass in quick on em0 inet proto udp from 10.0.37.112 port = 4046 to any
pass out on em0 inet proto icmp all icmp-type echoreq keep state (if-bound)
pass in on em0 inet proto icmp all icmp-type echoreq keep state (if-bound)
pass out on em0 proto tcp all keep state (if-bound)
pass out on em0 proto udp all keep state (if-bound)

NFS Client System:
------------------
FreeBSD 6.1 Feb 16 nfs client system with em(10.0.44.18), attached to em1 interface of firewall

Scenario:

I am mounting 10.0.37.112 from the client:
mount -o intr 10.0.37.112:/vol/scratch /mnt
This succeeds.  I then do an ls on the mount which has about 150 files:
ls /mnt
nfs server 10.0.37.112:/vol/scratch: not responding
^C

the command hangs, not receiving a valid reply due to a bad checksum on the nfs READDIR
reply: (tcpdump on the client system, I made sure I ifconfig em0 -txcsum -rxcsum and client
mtu is also 8000 now for testing)
15:10:16.437881 IP (tos 0x0, ttl  64, id 33816, offset 0, flags [none], proto: UDP (17), length: 152) 
10.0.44.18.1978945475 > 10.0.37.112.nfs: 124 readdir [|nfs]
15:10:16.438360 IP (tos 0x0, ttl  63, id 10076, offset 0, flags [none], proto: UDP (17), length: 
6328, bad cksum b721 (->a445)!) 10.0.37.112.nfs > 10.0.44.18.1978945475: reply ok 6300 readdir POST: 
DIR 1777 ids 0/0 [|nfs]

I verified that the checksum is the same leaving em1 on the firewall as it is when
received by the nfs client.  Checksums on incoming fragments on em0 on the firewall
are correct, but not comparable to the output on em1 since pf makes a new packet
out of the reassembled frags.  

If I use a scrub rule with 'fragment crop' or 'fragment drop-ovl', then pf will reject the fragments
on em0 because they are not reassembled into a single packet with port numbers that would allow the
sunrpc, nfsd, 4046(mountd) rules to match (theory?  I think it is truth).

If I replace the nfs server rules with:
pass  in  quick on $ext_if proto udp from 10.0.37.112 to any keep state
pass  in  quick on $ext_if proto udp from any port { 111, 2049 } to any keep state
And replace the scrub 'fragment reassemble' with 'fragment drop-ovl', then the fragments
are passed successfully without reassembly to the nfs client, but I do not want the
rule to be that open.  While I trust 10.0.37.112, I have other servers that I do not
trust but must allow through my firewall in the other direction later, so I want a
tighter rule.  It is my impression that I can have pf reassemble the NFS fragments,
match a rule with port numbers, and then the packet should leave my firewall in a
useful form.  It doesn't seem to make a difference if I use no-df random-id or not.

Please let me know what other details or tests you would like me to do.  I hope I 
have not left off any pertinent details, I might gloss over some because I feel 
so familiar with this scenario.