9-STABLE -> NFS -> NetAPP:
Hub- Marketing
marketing at hub.org
Wed Dec 19 05:16:03 UTC 2012
I'm running a few servers sitting on top of a NetAPP file server … everything runs great, but periodically I'm getting:
nfs_getpages: error 13
vm_fault: pager read error, pid 11355 (https)
errors on my screen … not always same pid … the annoying part is that it seems to always affect the same jail that is running .. if I shutdown all jails on that physical server, everything shuts down except for that *one* jail, with a ps listing looking like:
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 6670 0.0 0.0 9936 1372 ?? DsJ 3:00AM 0:00.01 newsyslog
root 6815 0.0 0.0 9936 1288 ?? DsJ 3:00AM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 8361 0.0 0.1 220740 11400 ?? DsJ 7:33PM 0:01.25 /usr/local/sbin/httpd -DNOHTTPACCEPT
www 8364 0.0 0.0 0 0 ?? ZJ 7:33PM 0:00.00 <defunct>
www 11866 0.0 0.1 318444 16792 ?? TJ 7:36PM 0:00.03 /usr/local/sbin/httpd -DNOHTTPACCEPT
www 11872 0.0 0.1 297964 14008 ?? TJ 7:36PM 0:00.01 /usr/local/sbin/httpd -DNOHTTPACCEPT
www 11873 0.0 0.1 306156 15028 ?? DEJ 7:36PM 0:00.02 /usr/local/sbin/httpd -DNOHTTPACCEPT
root 17190 0.0 0.0 9936 1240 ?? DsJ 8:00PM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 24864 0.0 0.0 9936 1392 ?? DsJ 4:00AM 0:00.01 newsyslog
root 24910 0.0 0.0 9936 1336 ?? DsJ 4:00AM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 29972 0.0 0.0 9936 1240 ?? DsJ 9:00PM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 34221 0.0 0.0 51480 4332 ?? DsJ 4:47AM 0:00.02 sshd: root at pts/1 (sshd)
root 42452 0.0 0.0 9936 1296 ?? DsJ 10:00PM 0:00.01 newsyslog
root 42522 0.0 0.0 9936 1240 ?? DsJ 10:00PM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 55179 0.0 0.0 9936 1296 ?? DsJ 11:00PM 0:00.01 newsyslog
root 55244 0.0 0.0 9936 1240 ?? DsJ 11:00PM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 67592 0.0 0.0 9936 1336 ?? DsJ 12:00AM 0:00.01 newsyslog
root 67762 0.0 0.0 9936 1288 ?? DsJ 12:00AM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 81603 0.0 0.0 9936 1340 ?? DsJ 1:00AM 0:00.01 newsyslog
root 81640 0.0 0.0 9936 1284 ?? DsJ 1:00AM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 93792 0.0 0.0 9936 1344 ?? DsJ 2:00AM 0:00.01 newsyslog
root 93815 0.0 0.0 9936 1288 ?? DsJ 2:00AM 0:00.01 /usr/sbin/newsyslog -f /usr/local/etc/rotate_logs.cfg
root 34228 0.0 0.0 67960 4464 1 Ds+J 4:47AM 0:00.00 sshd: root at pts/1 (sshd)
root 38473 0.0 0.0 17556 3272 3 SJ 4:53AM 0:00.02 /bin/tcsh
root 38475 0.0 0.0 14212 1512 3 R+J 4:53AM 0:00.00 ps aux
I can do a 'jexec <JID> /bin/tcsh' to get into the jail, I can perform ps commands, etc … I just can't get those processes to shutdown …
everything within the jail is 'up to date' … updates the userland and ports … I've checked over the NetApp, but everything appears fine, and it only seems to repeatedly affect that one jail, on that same physical server ...
I have no ideas on what / how to debug this … thoughts? help?
thx
More information about the freebsd-stable
mailing list