NFS nfs_getpages errors
Eric Crist
ecrist at secure-computing.net
Wed Sep 15 15:44:13 UTC 2010
On Sep 15, 2010, at 10:38:53, Steve Polyack wrote:
> On 09/15/10 11:28, Rick Macklem wrote:
>>> Hey folks,
>>>
>>> We've got 4 servers running FreeBSD 8.1-RELEASE which PXE boot with
>>> NFS root. On these machines, we run proftpd and apache 2.2. Over the
>>> past couple weeks, we've seen a ton of errors as follows:
>>>
>>> Sep 14 20:28:59 lion-3 proftpd[31761]: 0.0.0.0
>>> (folsom-1-red.claimlynx.com[216.17.68.130]) - ProFTPD terminating
>>> (signal 11)
>>> Sep 14 20:28:59 lion-3 kernel: nfs_getpages: error 1046353552
>>> Sep 14 20:28:59 lion-3 kernel: vm_fault: pager read error, pid 31761
>>> (proftpd)
>>> Sep 14 20:28:59 lion-3 kernel: Sep 14 20:28:59 lion-3 proftpd[31761]:
>>> 0.0.0.0 (folsom-1-red.claimlynx.com[216.17.68.130]) - ProFTPD
>>> terminating (signal 11)
>>> Sep 14 20:28:59 lion-3 kernel: nfs_getpages: error 1046353552
>>> Sep 14 20:28:59 lion-3 kernel: vm_fault: pager read error, pid 31761
>>> (proftpd)
>>> Sep 14 20:28:59 lion-3 kernel: pid 31761 (proftpd), uid 0: exited on
>>> signal 11
>>>
>>> These, in this case, occurred on three of the four machines until
>>> midnight after which all three of the machines had proftpd exit on
>>> signal 11. The message above was for child processes. At midnight, the
>>> logfile rotated, and newsyslog sent singal 1 to the parent process,
>>> which I think finally finished it off. The fourth machine remained
>>> running and did not display these messages.
>>>
>>> The number following 'nfs_getpages: error' changes for each cycle and
>>> I'm not certain if any of them repeat.
>>>
>> Well, at a quick glance, those errors seem to be coming from the NFS
>> server in a read reply. Also, the error values seem bogus, since they
>> should be small positive numbers (1<->70 + a few just above 10000).
> We see these errors on some 8.1 clients as well:
> nfs_getpages: error 1110586608
> nfs_getpages: error 1108948624
> vm_fault: pager read error, pid 56216 (php)
> nfs_getpages: error 1114969744
> vm_fault: pager read error, pid 54770 (php)
> nfs_getpages: error 1137006224
> vm_fault: pager read error, pid 50578 (php)
>
> They do not show up often, so we haven't spent much time looking into it (no tcpdumps yet). Our NFS server is a 8-STABLE system backed by ZFS, so maybe its related to that (again :) ).
>
> Eric, is your NFS server backed by ZFS as well?
>
> The NFS server doesn't seem to be logging any errors, but the ret-failed count is always increasing:
> Server Info:
> Getattr Setattr Lookup Readlink Read Write Create Remove
> 543523097 14397049 1949982185 6380 17587820 14002952 8980955 8070238
> Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access
> 6966495 9 1668 1117125 904969 5567689 22307 184929325
> Mknod Fsstat Fsinfo PathConf Commit
> 0 338500745 57 0 7129262
> Server Ret-Failed
> 29089796
> Server Faults
> 0
> Server Cache Stats:
> Inprog Idem Non-idem Misses
> 0 0 0 0
> Server Write Gathering:
> WriteOps WriteRPC Opsaved
> 14001235 14002952 1717
>
>> Could you possibly get a packet capture when one of these happens?
>> ("tcpdump -s -0 -w xxx host<nfs-server>" would suffice, but you need to
>> have it running when the error occurs. If you can reproduce it by
>> talking to the proftpd server, so the tcpdump doesn't run for too
>> long, that would be best.)
>>
>> You can look in the tcpdump via wireshark and see what it being returned
>> for the Read RPCs at that time. (You can email me the "xxx" packet trace
>> as an attachment and I can look at it, if you get that far.)
>>
>> rick
>> ps: Otherwise, I'd go look at your NFS server and see if it's logging
>> errors or if there are indications of problems.
The NFS server is logging nothing at all related to NFS. It *is* running 8.1-RC2, so there is potential for an update. If/when we notice these errors again, we'll try to get a packet capture and forward it to you. Our NFS server is backed by ZFS, as well.
Eric
More information about the freebsd-fs
mailing list