thr3ads.net - freebsd stable - nfs-server silent data corruption [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Arno J. Klaassen

2008-Apr-20 23:34 UTC

nfs-server silent data corruption

Hello,

I've a strange problem with a box I'm setting up as nfs-server
under 7-stable :

 - tyan S2895 MB, 2*285Dualcore Opteron, 4G-ECC, ahd-scsi, nfe-network
 - stripped GENERIC as kernel
 - sources as of last saturday afternoon (European time)

I removed everything from /boot/loader.conf and /etc/sysctl.conf, still
I get "easily" data corruption when exporting ahd-scsi over nfs
(NB exporting geom_raid5 gives same data corruption)

Testing with the following pseudo code :

  while checksum1 == checksum2 do
   create random file of $1 MBytes
   calculate md5 checksum1
   copy
   calculate md5 checksum2 on copy


Tested on both (as nfs-client) a 6-stable-i386 from a couple of weeks
ago as well as a linux 2.6.15-gentoo-r1 of about two years ago :
within half an hour the copy will be different .... ;(

I played with nfs-options on client side (nfs[23], conn, intr, [udp|tcp],
-r=, -w= ) but none seem to matter.

Start/Stop rpc.lock/sttatd on server/client just provoked some  :

 cp: utimes: BIG2: No such file or directory
 cp: chown: BIG2: Stale NFS file handle
 cp: chmod: BIG2: Stale NFS file handle
 cp: chflags: BIG2: Operation not supported
 cp: BIG2: Stale NFS file handle
 cp: setting permissions for `BIG2': Stale NFS file handle
 cp: closing `BIG2': Stale NFS file handle

[and then the while loop continued ... as if the NFS handle where not
 that stale ..]

Anyway, I'll try to nail this down more (e.g. nfs-write performance
is horrible ... (nfsd falling down to 0% cpu and then after while
'wake up' and be at around 3-6% again))

I didn't stress-test this MB for a while, but last time I did was
with 7-PRELEASE/RC?/CANTremember-exactly-but-close-to-release
and all worked great

I did add 2G ECC to the 2nd CPU since, though I doubt that interferes
with NFS.

Bref, if anyone has a suggestion ???? (I will try downgrade
to RELENG_7_0 iff noone has a new suggestion for RELENG_7, but I'd like
to go forward and test some maybe suspect recent MFC or other 
suggestion)

Thanx in advance,

best, Arno

Kris Kennaway

2008-Apr-21 09:47 UTC

head link

nfs-server silent data corruption

On Mon, Apr 21, 2008 at 01:02:33AM +0200, Arno J. Klaassen wrote:
> I didn't stress-test this MB for a while, but last time I did was
> with 7-PRELEASE/RC?/CANTremember-exactly-but-close-to-release
> and all worked great
> 
> I did add 2G ECC to the 2nd CPU since, though I doubt that interferes
> with NFS.
Uh, you're getting server-side data corruption, it could definitely be
because of the memory you added.

Kris

--
In God we Trust -- all others must submit an X.509 certificate.
    -- Charles Forsythe <forsythe@alum.mit.edu>

freebsd stable - Apr 2008 - nfs-server silent data corruption

nfs-server silent data corruption

nfs-server silent data corruption