nfs-server silent data corruption

pluknet pluknet at gmail.com
Tue Apr 22 20:14:08 UTC 2008


On 22/04/2008, Mike Tancsa <mike at sentex.net> wrote:
> At 02:00 PM 4/22/2008, Arno J. Klaassen wrote:
>
> > >
> > > Are you using the latest RELENG_7, or at least the latest version of
> > > nfe thats in RELENG_7 ?
> >
> >
> > Think so :
> >
>
>  OK, and it is the latest RELENG_7 ? Or just the if_nfe.c file has been
> manually updated ? Also, you are using ULE or the 4BSD scheduler ?  I still
> have 4BSD on the box I am testing on.

Hi, I have the same problem with data corruption (with nfe on nfs server side),
particularly when transferring large files.
Maybe this is somehow associated with the topic.

My simple test case:
truncate -s 1000m bigfile
^^ here I get zero-filed file
cp bigfile /nfs/mounted
^^ here I get not-at-all-zero-filed file, after uploading to nfs server

I looked at the corrupted file. It contains a few ranges, filed with
non-zero bytes:
equal to zero?  real 4-byte value   offset
======================================
not equal       1200355616     at pos=38797316
... <-- this range contains per-4bytes garbage, omit
not equal       3879749905     at pos=38813696

not equal       161160732      at pos=45613060
... <-- ditto
not equal       575257183      at pos=45629440

not equal       1943682165     at pos=59768836
... <-- ditto
not equal       2843639625     at pos=59785216

not equal       2653910121     at pos=60293124
... <-- ditto
not equal       3462830780     at pos=60309504

Some info:

nfs server on 8-CURRENT as of Apr 17
nfs client on 7.0-STABLE as of Apr 12

dmesg | grep nfe
nfe0: <NVIDIA nForce2 MCP2 Networking Adapter> port 0xe000-0xe007 mem
0xe2001000-0xe2001fff irq 20 at device 4.0 on pci0
miibus0: <MII bus> on nfe0
nfe0: Ethernet address: 00:04:61:6c:76:b1
nfe0: [FILTER]
nfe0: tx v1 error 0x6001
nfe0: tx v1 error 0x6001
nfe0: tx v1 error 0x6001
nfe0: tx v1 error 0x6001
nfe0: tx v1 error 0x6001
nfe0: tx v1 error 0x6001
nfe0: tx v1 error 0x6001
nfe0: tx v1 error 0x6001
nfe0: tx v1 error 0x6001
nfe0: tx v1 error 0x6001
nfe0: tx v1 error 0x6001
nfe0: tx v1 error 0x6001
^^^
This appears while cp'ing file to server.
(btw they do not appear with disabled polling, probably it's an another issue)

vmstat -i | grep nfe
irq20: nfe0 ohci0                      1          0

nfe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=48<VLAN_MTU,POLLING>
        ether 00:04:61:6c:76:b1
        inet 192.168.200.137 netmask 0xffffff00 broadcast 192.168.200.255
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
I can reproduce it regardless polling presence.

nfe0 at pci0:0:4:0:        class=0x020000 card=0x10001695 chip=0x006610de
rev=0xa1 hdr=0x00

wbr,
pluknet


More information about the freebsd-stable mailing list