nfs-server silent data corruption
Pyun YongHyeon
pyunyh at gmail.com
Wed Apr 23 05:20:01 UTC 2008
On Wed, Apr 23, 2008 at 12:13:44AM +0400, pluknet wrote:
> On 22/04/2008, Mike Tancsa <mike at sentex.net> wrote:
> > At 02:00 PM 4/22/2008, Arno J. Klaassen wrote:
> >
> > > >
> > > > Are you using the latest RELENG_7, or at least the latest version of
> > > > nfe thats in RELENG_7 ?
> > >
> > >
> > > Think so :
> > >
> >
> > OK, and it is the latest RELENG_7 ? Or just the if_nfe.c file has been
> > manually updated ? Also, you are using ULE or the 4BSD scheduler ? I still
> > have 4BSD on the box I am testing on.
>
> Hi, I have the same problem with data corruption (with nfe on nfs server side),
> particularly when transferring large files.
> Maybe this is somehow associated with the topic.
>
> My simple test case:
> truncate -s 1000m bigfile
> ^^ here I get zero-filed file
> cp bigfile /nfs/mounted
> ^^ here I get not-at-all-zero-filed file, after uploading to nfs server
>
> I looked at the corrupted file. It contains a few ranges, filed with
> non-zero bytes:
> equal to zero? real 4-byte value offset
> ======================================
> not equal 1200355616 at pos=38797316
> ... <-- this range contains per-4bytes garbage, omit
> not equal 3879749905 at pos=38813696
>
> not equal 161160732 at pos=45613060
> ... <-- ditto
> not equal 575257183 at pos=45629440
>
> not equal 1943682165 at pos=59768836
> ... <-- ditto
> not equal 2843639625 at pos=59785216
>
> not equal 2653910121 at pos=60293124
> ... <-- ditto
> not equal 3462830780 at pos=60309504
>
> Some info:
>
> nfs server on 8-CURRENT as of Apr 17
> nfs client on 7.0-STABLE as of Apr 12
>
> dmesg | grep nfe
> nfe0: <NVIDIA nForce2 MCP2 Networking Adapter> port 0xe000-0xe007 mem
> 0xe2001000-0xe2001fff irq 20 at device 4.0 on pci0
> miibus0: <MII bus> on nfe0
> nfe0: Ethernet address: 00:04:61:6c:76:b1
> nfe0: [FILTER]
> nfe0: tx v1 error 0x6001
> nfe0: tx v1 error 0x6001
> nfe0: tx v1 error 0x6001
> nfe0: tx v1 error 0x6001
> nfe0: tx v1 error 0x6001
> nfe0: tx v1 error 0x6001
> nfe0: tx v1 error 0x6001
> nfe0: tx v1 error 0x6001
> nfe0: tx v1 error 0x6001
> nfe0: tx v1 error 0x6001
> nfe0: tx v1 error 0x6001
> nfe0: tx v1 error 0x6001
> ^^^
I'm not sure it's related with data corruption issue but 0x6001
would mean Tx underflow error. I recall these Tx errors were seen
on nfe(4) if negotiated speed/duplex does not match with link
partner or MACs.
Does link partner also agree on speed/duplex settings of nfe(4)?
What PHY driver nfe(4) use?
> This appears while cp'ing file to server.
> (btw they do not appear with disabled polling, probably it's an another issue)
>
> vmstat -i | grep nfe
> irq20: nfe0 ohci0 1 0
>
> nfe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> options=48<VLAN_MTU,POLLING>
> ether 00:04:61:6c:76:b1
> inet 192.168.200.137 netmask 0xffffff00 broadcast 192.168.200.255
> media: Ethernet autoselect (100baseTX <full-duplex>)
> status: active
> I can reproduce it regardless polling presence.
>
> nfe0 at pci0:0:4:0: class=0x020000 card=0x10001695 chip=0x006610de
> rev=0xa1 hdr=0x00
>
--
Regards,
Pyun YongHyeon
More information about the freebsd-stable
mailing list