nfs-server silent data corruption
Arno J. Klaassen
arno at heho.snv.jussieu.fr
Sat Apr 26 15:14:56 UTC 2008
Hello,
Mike Tancsa <mike at sentex.net> writes:
> At 02:35 PM 4/22/2008, Arno J. Klaassen wrote:
>
> > > Also, you are using ULE or the 4BSD scheduler ? I
> > > still have 4BSD on the box I am testing on.
> >
> >Interesting, this is with ULE. I didn't really test 4BSD on this
> >box (I believed those who said SMP needs ULE *and* am quite
> >satisfied with overall performance). I'll try 4BSD though time
> >is getting short; I promised to deliver this box next thursday but will
> >still have some days for on-site testing.
>
>
> I have recompiled the kernel with ULE, and it seems fine as well. I
> ran 160 iterations of a 300MB file and there was no corruption. Same
> process - copy a junk random file over nfs mount, unmount the nfs
> mount, remount it copy it back, compare the files.
Let me summarise my investigations till now :
- in all failing cases just *one* byte is currupted, 4 or all 8 bits
set to zero *and* the original value is one out of the limited
subset {1, 8, 9} ....
here is the output of `cmp -x $i/BIG $i/BIG2` for some failing
cases I saved :
03869a48 09 00
05209d88 09 00
01777148 09 00
00f10f88 09 00
01f4c4c8 11 00
06c3d6c8 11 00
0725ca48 18 00
01608008 09 00
00f3b888 18 00
07aa45c8 29 20
- it does *not* seem to depend on :
- the interface : I could produce it using nfe0, nfe1 and
re0 using some netgear pci-card
- the distribution of the 4Gig memory : installing 4G at
CPU1 or 1G at CPU1 and 2G at CPU2 produces same results
(NB, all memory passed memtest.iso in both situtations
for complete run)
- the frequency control method : easier to produce with
cpufreq/powerd, but finally I can reproduce the cooruption
as well using acpi_ppc
- the nfs-client and options (not exhaustively tested, but different
test include i386-releng6, amd64-releng6 and linux, and quite
a set of different try and see mounf_nfs options
I am testing right now with a fixed frequency of 1Ghz.
I am not so inclined to test 4BSD, since reboot possibilities are
limited for me now on this box, but I set up next week a similar
board (S3992e) (iff I can find quad-core socket F over here ...)
and in a certain sense hope I can reproduce it an that board as well.
Best, Arno
More information about the freebsd-stable
mailing list