9-STABLE -> NFS -> NetAPP:

Hub- Marketing marketing at hub.org
Wed Dec 19 05:16:03 UTC 2012


I'm running a few servers sitting on top of a NetAPP file server … everything runs great, but periodically I'm getting:

nfs_getpages: error 13
vm_fault: pager read error, pid 11355 (https)

errors on my screen … not always same pid … the annoying part is that it seems to always affect the same jail that is running .. if I shutdown all jails on that physical server, everything shuts down except for that *one* jail, with a ps listing looking like:

USER   PID %CPU %MEM    VSZ   RSS TT  STAT STARTED    TIME COMMAND
root  6670  0.0  0.0   9936  1372 ??  DsJ   3:00AM 0:00.01 newsyslog
root  6815  0.0  0.0   9936  1288 ??  DsJ   3:00AM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root  8361  0.0  0.1 220740 11400 ??  DsJ   7:33PM 0:01.25 /usr/local/sbin/httpd -DNOHTTPACCEPT
www   8364  0.0  0.0      0     0 ??  ZJ    7:33PM 0:00.00 <defunct>
www  11866  0.0  0.1 318444 16792 ??  TJ    7:36PM 0:00.03 /usr/local/sbin/httpd -DNOHTTPACCEPT
www  11872  0.0  0.1 297964 14008 ??  TJ    7:36PM 0:00.01 /usr/local/sbin/httpd -DNOHTTPACCEPT
www  11873  0.0  0.1 306156 15028 ??  DEJ   7:36PM 0:00.02 /usr/local/sbin/httpd -DNOHTTPACCEPT
root 17190  0.0  0.0   9936  1240 ??  DsJ   8:00PM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 24864  0.0  0.0   9936  1392 ??  DsJ   4:00AM 0:00.01 newsyslog
root 24910  0.0  0.0   9936  1336 ??  DsJ   4:00AM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 29972  0.0  0.0   9936  1240 ??  DsJ   9:00PM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 34221  0.0  0.0  51480  4332 ??  DsJ   4:47AM 0:00.02 sshd: root at pts/1 (sshd)
root 42452  0.0  0.0   9936  1296 ??  DsJ  10:00PM 0:00.01 newsyslog
root 42522  0.0  0.0   9936  1240 ??  DsJ  10:00PM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 55179  0.0  0.0   9936  1296 ??  DsJ  11:00PM 0:00.01 newsyslog
root 55244  0.0  0.0   9936  1240 ??  DsJ  11:00PM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 67592  0.0  0.0   9936  1336 ??  DsJ  12:00AM 0:00.01 newsyslog
root 67762  0.0  0.0   9936  1288 ??  DsJ  12:00AM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 81603  0.0  0.0   9936  1340 ??  DsJ   1:00AM 0:00.01 newsyslog
root 81640  0.0  0.0   9936  1284 ??  DsJ   1:00AM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 93792  0.0  0.0   9936  1344 ??  DsJ   2:00AM 0:00.01 newsyslog
root 93815  0.0  0.0   9936  1288 ??  DsJ   2:00AM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 34228  0.0  0.0  67960  4464  1  Ds+J  4:47AM 0:00.00 sshd: root at pts/1 (sshd)
root 38473  0.0  0.0  17556  3272  3  SJ    4:53AM 0:00.02 /bin/tcsh
root 38475  0.0  0.0  14212  1512  3  R+J   4:53AM 0:00.00 ps aux

I can do a 'jexec <JID> /bin/tcsh' to get into the jail, I can perform ps commands, etc … I just can't get those processes to shutdown …

everything within the jail is 'up to date' … updates the userland and ports … I've checked over the NetApp, but everything appears fine, and it only seems to repeatedly affect that one jail, on that same physical server ...

I have no ideas on what / how to debug this … thoughts?  help?

thx




More information about the freebsd-stable mailing list