FreeBSD 10.x + LiquidSoap + NFS == Server Hang

Fri Jul 4 04:53:43 UTC 2014

k, just found http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-online-ddb.html and setup KDB/DDB and just tested that using the ‘sysctl’ works to get me to the KDB prompt … hopefully this will allow me to provide more useful information, if someone can let me know what exactly that would be for next time it hangs? :)

thx

On Jul 3, 2014, at 9:26 PM, Marc Fournier <scrappy at hub.org> wrote:

> 
> Oh, on the remote console, last two lines I see are:
> 
> ==
> nfs_getpages: error 4
> vm_fault: pager read error, pid 2957 (liquid soap)
> ==
> 
> if that helps any ... 
> 
> On Jul 3, 2014, at 9:23 PM, Marc Fournier <scrappy at hub.org> wrote:
> 
>> 
>> Hi all …
>> 
>> 	I have a jail running on FreeBSD 10-STABLE (svn update as of July 2nd @ ~05:30 UTC:
>> 
>> ==
>> Working Copy Root Path: /usr/src
>> URL: https://svn0.us-east.freebsd.org/base/stable/10
>> Relative URL: ^/stable/10
>> Repository Root: https://svn0.us-east.freebsd.org/base
>> Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
>> Revision: 268135
>> Node Kind: directory
>> Schedule: normal
>> Last Changed Author: pfg
>> Last Changed Rev: 268132
>> Last Changed Date: 2014-07-02 01:28:38 +0000 (Wed, 02 Jul 2014)
>> ==
>> 
>> 	Currently it has 3 jail’d environments running off it, with the files for them NFS mounted from a NetApp filer … and right now, the NFS mount that these jails are running from is “locked” … a ‘df’ hangs … trying to do a ‘jexec # /bin/tcsh’ into one of the jail’s hangs … etc.
>> 
>> 	The same NFS file system is mounted and running on a half dozen other servers, and they are all operating just fine, so the NetApp is operating properly.
>> 
>> 	If I move the jail with liquidsoap running around to a different server, the hang will follow to the new server, and the old server will once more become rock solid … 
>> 
>> 	I’m not 100% certain it is liquidsoap, but the hang appears to always coincide with reloading a new playlist … and although it happens frequently (more with recent upgrades), it doesn’t happen *every* night …
>> 
>> 	This is on a remote server … so doing things at the console isn’t possible, and although I’ve got a remote console on this, I’ve never figured out how to break to the debugger through it, although I’m going to work on it to see if I can’t get it to work …
>> 
>> 	Baring breaking to the debugger (is there a way, from the command line, to force it to break to the debugger?), is there anything else I can use to provide some sort of useful information?
>> 
>> ps aux for the proces shows:
>> 
>> # ps aux | grep liq
>> 1002     2957   0.0  0.7 226888 112792  -  TLJ   4:45AM   370:27.23 /usr/local/bin/liquidsoap -q -d /usr/local/etc/liquidsoap/liquidsoap.liq
>> 
>> and:
>> 
>> # ps auxxwl | grep 2957
>> 1002     2957   0.0  0.7 226888 112792  -  TLJ   4:45AM   370:27.23 /usr/local/bin/l  1002     1   0  20  0 -
>> 1002    96280   0.0  0.0  12316      0  -  IWJ  -           0:00.00 pwait 2957        1002 96274   0  52  0 kqread
>> root    96508   0.0  0.0  18788   1828  4  S+    4:19AM     0:00.00 grep 2957            0 96505   0  20  0 piperd
>> 
>> 	Other commands I can / should run next time it happens … ?    Which won’t take long ...
>> 
>> Thanks …
>> 
>> 
> 
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"