NFS nfs_getpages errors

Steve Polyack korvus at comcast.net
Wed Sep 15 15:54:42 UTC 2010


  On 09/15/10 11:28, Rick Macklem wrote:
>> Hey folks,
>>
>> We've got 4 servers running FreeBSD 8.1-RELEASE which PXE boot with
>> NFS root. On these machines, we run proftpd and apache 2.2. Over the
>> past couple weeks, we've seen a ton of errors as follows:
>>
>> Sep 14 20:28:59 lion-3 proftpd[31761]: 0.0.0.0
>> (folsom-1-red.claimlynx.com[216.17.68.130]) - ProFTPD terminating
>> (signal 11)
>> Sep 14 20:28:59 lion-3 kernel: nfs_getpages: error 1046353552
>> Sep 14 20:28:59 lion-3 kernel: vm_fault: pager read error, pid 31761
>> (proftpd)
>> Sep 14 20:28:59 lion-3 kernel: Sep 14 20:28:59 lion-3 proftpd[31761]:
>> 0.0.0.0 (folsom-1-red.claimlynx.com[216.17.68.130]) - ProFTPD
>> terminating (signal 11)
>> Sep 14 20:28:59 lion-3 kernel: nfs_getpages: error 1046353552
>> Sep 14 20:28:59 lion-3 kernel: vm_fault: pager read error, pid 31761
>> (proftpd)
>> Sep 14 20:28:59 lion-3 kernel: pid 31761 (proftpd), uid 0: exited on
>> signal 11
>>
>> These, in this case, occurred on three of the four machines until
>> midnight after which all three of the machines had proftpd exit on
>> signal 11. The message above was for child processes. At midnight, the
>> logfile rotated, and newsyslog sent singal 1 to the parent process,
>> which I think finally finished it off. The fourth machine remained
>> running and did not display these messages.
>>
>> The number following 'nfs_getpages: error' changes for each cycle and
>> I'm not certain if any of them repeat.
>>
> Well, at a quick glance, those errors seem to be coming from the NFS
> server in a read reply. Also, the error values seem bogus, since they
> should be small positive numbers (1<->70 + a few just above 10000).
We see these errors on some 8.1 clients as well:
nfs_getpages: error 1110586608
nfs_getpages: error 1108948624
vm_fault: pager read error, pid 56216 (php)
nfs_getpages: error 1114969744
vm_fault: pager read error, pid 54770 (php)
nfs_getpages: error 1137006224
vm_fault: pager read error, pid 50578 (php)

They do not show up often, so we haven't spent much time looking into it 
(no tcpdumps yet).  Our NFS server is a 8-STABLE system backed by ZFS, 
so maybe its related to that (again :) ).

Eric, is your NFS server backed by ZFS as well?

The NFS server doesn't seem to be logging any errors, but the ret-failed 
count is always increasing:
Server Info:
   Getattr   Setattr    Lookup  Readlink      Read     Write    
Create    Remove
543523097  14397049 1949982185      6380  17587820  14002952   8980955   
8070238
    Rename      Link   Symlink     Mkdir     Rmdir   Readdir  
RdirPlus    Access
   6966495         9      1668   1117125    904969   5567689     22307 
184929325
     Mknod    Fsstat    Fsinfo  PathConf    Commit
         0 338500745        57         0   7129262
Server Ret-Failed
          29089796
Server Faults
             0
Server Cache Stats:
    Inprog      Idem  Non-idem    Misses
         0         0         0         0
Server Write Gathering:
  WriteOps  WriteRPC   Opsaved
  14001235  14002952      1717

> Could you possibly get a packet capture when one of these happens?
> ("tcpdump -s -0 -w xxx host<nfs-server>" would suffice, but you need to
>   have it running when the error occurs. If you can reproduce it by
>   talking to the proftpd server, so the tcpdump doesn't run for too
>   long, that would be best.)
>
> You can look in the tcpdump via wireshark and see what it being returned
> for the Read RPCs at that time. (You can email me the "xxx" packet trace
> as an attachment and I can look at it, if you get that far.)
>
> rick
> ps: Otherwise, I'd go look at your NFS server and see if it's logging
>      errors or if there are indications of problems.
>
>



More information about the freebsd-fs mailing list