NFS-Locking problem with 6.4/7.1-RELEASE

Matthias Schuendehuette msch at snafu.de
Wed Jan 21 11:49:27 PST 2009


Hi,

one of our FreeBSD-Servers is acting as NFS-Server for $HOME for  
approx. 50 HP-UX Workstations, since the WS itself and the disks in  
there become quite old in the meantime.

That works quite good with FreeBSD 6.3-RELEASE-pxx but doesn't work  
with 6.4/7.1 any more.

I looked with 'wireshark' on the problem and it seems to be a locking  
problen, probably related to PR 'kern/130628', but I'm not sure.

Here what I know so far:

Server-OS:	FreeBSD 6.4-RELEASE/7.1-RELEASE (same problems)
Workstation-OS:	HP-UX 11iv1 (11.11)
NFS-Version:	V3/tcp or V3/udp (NFS-V2 works!)

I found no records of the problem on the client side (HP-UX) whereas  
on FreeBSD 'rpc.lockd -d 3'
produces the following entries in /var/log/messages:

Jan 21 12:07:33 bsd1dw kernel: NLM: new host hp13 (sysid 5)
Jan 21 12:07:33 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13  
(sysid = 5)
Jan 21 12:07:53 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13  
(sysid = 5)
Jan 21 12:08:13 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13  
(sysid = 5)
Jan 21 12:08:32 bsd1dw kernel: nlm_do_lock(): caller_name = hp13  
(sysid = 5)
Jan 21 12:08:33 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13  
(sysid = 5)
Jan 21 12:08:43 bsd1dw kernel: nlm_do_lock(): caller_name = hp13  
(sysid = 5)
Jan 21 12:08:53 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13  
(sysid = 5)
Jan 21 12:09:03 bsd1dw kernel: nlm_do_lock(): caller_name = hp13  
(sysid = 5)
Jan 21 12:09:13 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13  
(sysid = 5)
Jan 21 12:09:13 bsd1dw kernel: nlm_do_lock(): caller_name = hp13  
(sysid = 5)
Jan 21 12:09:23 bsd1dw kernel: nlm_do_lock(): caller_name = hp13  
(sysid = 5)
Jan 21 12:09:33 bsd1dw kernel: nlm_do_cancel(): caller_name = hp13  
(sysid = 5)


What happens is as follows:

When logging in to an account with the home directory on the NFS- 
Server, the shell
reads '.profile' and the tries to get a lock on '.sh_history'. From a  
FreeBSD 6.3 server the shell gets the lock whereas a 6.4/7.1 server  
replies with "V4 LOCK_RES Call NLM_FAILED".

Of course the HP-UX shell assumes the file is already locked, waits  
some time and tries again. This game leads to a complete lock of the  
account... :-( This does not happen if commandline-history is disabled  
but nontheless it's an error anyway.


I have recorded the network traffic for a NFSv2 session, a NFSv3/tcp  
session with a 6.3 server and a NFSv3/tcp session with a 7-STABLE  
server. If the wireshark dumps are of interest beyond of what I  
described here they are available on request.

I hope my informations help those who are able to fix it...

Matthew

-- 
Ciao/BSD - Matthias

Matthias Schuendehuette    <msch [at] snafu.de>, Berlin (Germany)





More information about the freebsd-net mailing list