Trouble: NFS via TCP

Oliver Fromme olli at lurza.secnetix.de
Thu Nov 9 17:17:15 UTC 2006


Hi,

I've got a very weird problem with NFS mounts on a RELENG_6
machine (a.k.a 6.2-PRERELEASE, sources synced yesterday,
November 8th).  It's an HP Proliant DL360 G4 (G4p to be
exact), but that shouldn't matter.  I've been banging my
head on the table for several hours, but I can't find the
source of the problem.  :-(

What I'm trying to do should be very simple:  mounting an
NFS directory via TCP (instead of UDP which is the default),
like this:

# mount_nfs -T -3 -R 3 -i -s -o ro 127.0.0.1:/localdisk /nfs/test

Symptom:  As soon as I use the -T option (TCP) with the
mount command, it simply hangs forever.  If I use the
intr/soft flags, I can Ctrl-C it after a while, and the
mount indeed appears in the output from "mount", but any
command that tries to access it (e.g. ls(1)) also hangs.
Even umount(8) hangs.

More observations:

 - UDP works perfectly fine.  No problems at all.
 - Other TCP connections beside NFS (e.g. ssh) work fine.
 - IPF is present, but disabled (ipf -D).
 - IPFW only contains the default "allow any to any" rule.
 - The interface doesn't matter.  Mounting from localhost
   (via lo0) has the same problem as via a real NIC.
 - I first observed the problem on RELENG_6 of 2006-10-19
   (but it could be much older, because I haven't tried
   NFS-via-TCP on this machine before).  Then I updated
   to 2006-11-08, no change.
 - SMP or UP kernel doesn't make a difference.
 - No special compiler flags, make.conf is empty.
 - Kernel config is GENERIC with a few additions for more
   shared memory and semaphores (so Squid and PostgreSQL
   are happy) and some other unrelated details.
 - No suspicious things in dmesg.  Kernel prints nothing
   during the mount attempts.
 - Output from rpcinfo -p looks good.
 - tcpdump shows that the TCP connection is immediately
   shut down:  After connecting successfully, it sends a
   FIN, then reconnects, etc. ad infinitum.  Meanwhile
   vfs.nfs.reconnects increases slowly.
 - On a different machine (different hardware, but same
   RELENG_6 and very similar kernel config), the problem
   does *NOT* occur.  I compared sysctl variables relevant
   to nfs, rpc and tcp, and they're all the same.  Also,
   rpcinfo -p is the same.

Now I'm running out of ideas ...  Obviously there must be
something special with that machine, because it works fine
on a different machine, but I'm not able to find out what
it is.

I even considered putting a few printf() calls into some
places in sys/nfsclient/nfs_socket.c to find out what's
going on, but I'm not sure if that makes sense and whether
it will give any useful results.

Any hints and ideas will be greatly appreciated.

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"And believe me, as a C++ programmer, I don't hesitate to question
the decisions of language designers.  After a decent amount of C++
exposure, Python's flaws seem ridiculously small." -- Ville Vainio


More information about the freebsd-stable mailing list