post ino64: lockd no runs?
John Baldwin
jhb at freebsd.org
Mon Jun 12 18:48:17 UTC 2017
On Sunday, June 11, 2017 11:12:25 AM David Wolfskill wrote:
> On Sun, Jun 04, 2017 at 08:57:44AM -0400, Michael Butler wrote:
> > It seems that {rpc.}lockd no longer runs after the ino64 changes on any
> > of my systems after a full rebuild of src and ports. No log entries
> > offer any insight as to why :-(
> >
> > imb
>
> I don't tend to use NFS on my systems that are running head, so I
> haven't had occasion to test this as stated.
>
> However, I just completed my weekly update of the "prooduction" systems
> here at home, running stable/11. And I find that lockd seems to be ...
> claiming that all is well, but declining to run (for long).
>
> To the best of my knowledge, that was not the case until this last
> update, which was from:
>
> FreeBSD albert.catwhisker.org 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #316 r319566M/319569:1100514: Sun Jun 4 03:54:41 PDT 2017 root at freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT amd64
>
> to
>
> FreeBSD albert.catwhisker.org 11.1-BETA1 FreeBSD 11.1-BETA1 #322 r319823M/319823:1100514: Sun Jun 11 03:56:10 PDT 2017 root at freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT amd64
>
> The "glaringly obvious" symptom in my case is that I am now unable
> to (directly) save an email message from within mutt(1) by appending
> it to an NFS-resident file. (Saving it to a local file, then using
> cat(1) to append that to the NFS- resident file & removing the local
> copy works....)
>
> After a few variations on a theme of:
>
> albert(11.1)[5] sudo service lockd restart
> lockd not running?
> Starting lockd.
> albert(11.1)[6] echo $?
> 0
> albert(11.1)[7] service lockd status
> lockd is not running.
>
> I finally(!) thought to ask ktrace what's going on (as tailing
> /var/log/messages was completely unproductive, even after enabling
> rc_debug).
>
> So I tried: "sudo ktrace -di service lockd restart"; upon exanimation of
> the output of kdump(1), I see that the trace ends with:
>
> ...
> 2811 rpc.lockd NAMI "/var/run/logpriv"
> 2786 sh CALL read(0xa,0x627fc0,0x400)
> 2786 sh GIO fd 10 read 0 bytes
> ""
> 2811 rpc.lockd RET connect 0
> 2786 sh RET read 0
> 2811 rpc.lockd CALL sendto(0x3,0x7fffffffe2c0,0x27,0,0,0)
> 2786 sh CALL exit(0)
> 2811 rpc.lockd GIO fd 3 wrote 39 bytes
> "<30>Jun 11 15:43:10 rpc.lockd: Starting"
> 2811 rpc.lockd RET sendto 39/0x27
> 2811 rpc.lockd CALL sigaction(SIGALRM,0x7fffffffec20,0)
> 2811 rpc.lockd RET sigaction 0
> 2811 rpc.lockd CALL nlm_syscall(0,0x1e,0x4,0x801015040)
> 2811 rpc.lockd RET nlm_syscall -1 errno 14 Bad address
This is a really good clue. nlm_syscall is dying with EFAULT. The last
argument is a pointer to an array of char * pointers, and the only way
I can see it dying is if it fails to copyin() one of the strings pointed
to by those pointers. You could try running rpc.lockd under gdb from
ports and setting a breakpoint on 'nlm_syscall' and then printing out
'addr_count' and 'p addrs@(addr_count * 2)'.
Unfortunately I'm not able to reproduce the failure on a test machine
I have running head post-ino64.
--
John Baldwin
More information about the freebsd-stable
mailing list