nfs-server silent data corruption

Arno J. Klaassen arno at heho.snv.jussieu.fr
Sat Apr 26 15:14:56 UTC 2008



Hello,


Mike Tancsa <mike at sentex.net> writes:

> At 02:35 PM 4/22/2008, Arno J. Klaassen wrote:
> 
> > > Also, you are using ULE or the 4BSD scheduler ?  I
> > > still have 4BSD on the box I am testing on.
> >
> >Interesting, this is with ULE. I didn't really test 4BSD on this
> >box (I believed those who said SMP needs ULE *and* am quite
> >satisfied with overall performance). I'll try 4BSD though time
> >is getting short; I promised to deliver this box next thursday but will
> >still have some days for on-site testing.
> 
> 
> I have recompiled the kernel with ULE, and it seems fine as well.  I
> ran 160 iterations of a 300MB file and there was no corruption.  Same
> process - copy a junk random file over nfs mount, unmount the nfs
> mount, remount it copy it back, compare the files.


Let me summarise my investigations till now :


- in all failing cases just *one* byte is currupted, 4 or all 8 bits
  set to zero *and* the original value is one out of the limited
  subset {1, 8, 9} ....

  here is the output of `cmp -x $i/BIG $i/BIG2` for some failing
  cases I saved :


  03869a48 09 00
  05209d88 09 00
  01777148 09 00
  00f10f88 09 00
  01f4c4c8 11 00
  06c3d6c8 11 00
  0725ca48 18 00
  01608008 09 00
  00f3b888 18 00

  07aa45c8 29 20


- it does *not* seem to depend on :

   - the interface : I could produce it using nfe0, nfe1 and 
     re0 using some netgear pci-card

   - the distribution of the 4Gig memory : installing 4G at 
     CPU1 or 1G at CPU1 and 2G at CPU2 produces same results
     (NB, all memory passed memtest.iso in both situtations
      for complete run)

   - the frequency control method : easier to produce with
     cpufreq/powerd, but finally I can reproduce the cooruption
     as well using acpi_ppc

   - the nfs-client and options (not exhaustively tested, but different
     test include i386-releng6, amd64-releng6 and linux, and quite
     a set of different try and see mounf_nfs options

I am testing right now with a fixed frequency of 1Ghz.

I am not so inclined to test 4BSD, since reboot possibilities are
limited for me now on this box, but I set up next week a similar
board (S3992e) (iff I can find quad-core socket F over here ...)
and in a certain sense hope I can reproduce it an that board as well.

Best, Arno


More information about the freebsd-stable mailing list