one virtualbox vm disrupts all vms and entire network

Steve Tuts yiz5hwi at gmail.com
Wed Jun 6 22:46:09 UTC 2012


On Wed, Jun 6, 2012 at 3:50 AM, Bernhard Froehlich <decke at freebsd.org>wrote:

> On 05.06.2012 20:16, Bernhard Froehlich wrote:
>
>> On 05.06.2012 19:05, Steve Tuts wrote:
>>
>>> On Mon, Jun 4, 2012 at 4:11 PM, Rusty Nejdl <rnejdl at ringofsaturn.com>
>>> wrote:
>>>
>>>  On 2012-06-02 12:16, Steve Tuts wrote:
>>>>
>>>>  Hi, we have a Dell poweredge server with a dozen interfaces.  It hosts
>>>>> a
>>>>> few guests of web app and email servers with VirtualBox-4.0.14.  The
>>>>> host
>>>>> and all guests are FreeBSD 9.0 64bit.  Each guest is bridged to a
>>>>> distinct
>>>>> interface.  The host and all guests are set to 10.0.0.0 network NAT'ed
>>>>> to
>>>>> a
>>>>> cicso router.
>>>>>
>>>>> This runs well for a couple months, until we added a new guest
>>>>> recently.
>>>>> Every few hours, none of the guests can be connected.  We can only
>>>>> connect
>>>>> to the host from outside the router.  We can also go to the console of
>>>>> the
>>>>> guests (except the new guest), but from there we can't ping the gateway
>>>>> 10.0.0.1 any more.  The new guest just froze.
>>>>>
>>>>> Furthermore, on the host we can see a vboxheadless process for each
>>>>> guest,
>>>>> including the new guest.  But we can not kill it, not even with "kill
>>>>> -9".
>>>>> We looked around the web and someone suggested we should use "kill
>>>>> -SIGCONT" first since the "ps" output has the "T" flag for that
>>>>> vboxheadless process for that new guest, but that doesn't help.  We
>>>>> also
>>>>> tried all the VBoxManager commands to poweroff/reset etc that new
>>>>> guest,
>>>>> but they all failed complaining that vm is in Aborted state.  We also
>>>>> tried
>>>>> VBoxManager commands to disconnect the network cable for that new
>>>>> guest,
>>>>> it
>>>>> didn't complain, but there was no effect.
>>>>>
>>>>> For a couple times, on the host we disabled the interface bridging that
>>>>> new
>>>>> guest, then that vboxheadless process for that new guest disappeared
>>>>> (we
>>>>> attempted to kill it before that).  And immediately all other vms
>>>>> regained
>>>>> connection back to normal.
>>>>>
>>>>> But there is one time even the above didn't help - the vboxheadless
>>>>> process
>>>>> for that new guest stubbonly remains, and we had to reboot the host.
>>>>>
>>>>> This is already a production server, so we can't upgrade virtualbox to
>>>>> the
>>>>> latest version until we obtain a test server.
>>>>>
>>>>> Would you advise:
>>>>>
>>>>> 1. is there any other way to kill that new guest instead of rebooting?
>>>>> 2. what might cause the problem?
>>>>> 3. what setting and test I can do to analyze this problem?
>>>>> ______________________________****_________________
>>>>>
>>>>>
>>>> I haven't seen any comments on this and don't want you to think you are
>>>> being ignored but I haven't seen this but also, the 4.0 branch was
>>>> buggier
>>>> for me than the 4.1 releases so yeah, upgrading is probably what you are
>>>> looking at.
>>>>
>>>> Rusty Nejdl
>>>> ______________________________****_________________
>>>>
>>>>
>>>>  sorry, just realize my reply yesterday didn't go to the list, so am
>>> re-sending with some updates.
>>>
>>> Yes, we upgraded all ports and fortunately everything went back and
>>> especially all vms has run peacefully for two days now.  So upgrading to
>>> the latest virtualbox 4.1.16 solved that problem.
>>>
>>> But now we got a new problem with this new version of virtualbox:
>>> whenever
>>> we try to vnc to any vm, that vm will go to Aborted state immediately.
>>> Actually, merely telnet from within the host to the vnc port of that vm
>>> will immediately Abort that vm.  This prevents us from adding new vms.
>>> Also, when starting vm with vnc port, we got this message:
>>>
>>> rfbListenOnTCP6Port: error in bind IPv6 socket: Address already in use
>>>
>>> , which we found someone else provided a patch at
>>> http://permalink.gmane.org/**gmane.os.freebsd.devel.**emulation/10237<http://permalink.gmane.org/gmane.os.freebsd.devel.emulation/10237>
>>>
>>> So looks like when there are multiple vms on a ipv6 system (we have 64bit
>>> FreeBSD 9.0) will get this problem.
>>>
>>
>> Glad to hear that 4.1.16 helps for the networking problem. The VNC problem
>> is also a known one but the mentioned patch does not work at least for a
>> few people. It seems the bug is somewhere in libvncserver so downgrading
>> net/libvncserver to an earlier version (and rebuilding virtualbox) should
>> help until we come up with a proper fix.
>>
>
> You are right about the "Address already in use" problem and the patch for
> it so I will commit the fix in a few moments.
>
> I have also tried to reproduce the VNC crash but I couldn't. Probably
> because
> my system is IPv6 enabled. flo@ has seen the same crash and has no IPv6 in
> his kernel which lead him to find this commit in libvncserver:
>
>
> commit 66282f58000c8863e104666c30cb67**b1d5cbdee3
> Author: Kyle J. McKay <mackyle at gmail.com>
> Date:   Fri May 18 00:30:11 2012 -0700
>     libvncserver/sockets.c: do not segfault when listenSock/listen6Sock ==
> -1
>
> http://libvncserver.git.**sourceforge.net/git/gitweb.**cgi?p=libvncserver/
> **libvncserver;a=commit;h=**66282f5<http://libvncserver.git.sourceforge.net/git/gitweb.cgi?p=libvncserver/libvncserver;a=commit;h=66282f5>
>
>
> It looks promising so please test this patch if you can reproduce the
> crash.
>
>
> --
> Bernhard Froehlich
> http://www.bluelife.at/
>

Sorry, I tried to try this patch, but couldn't figure out how to do that.
I use ports to compile everything, and can see the file is at
/usr/ports/net/libvncserver/work/LibVNCServer-0.9.9/libvncserver/sockets.c
.  However, if I edit this file and do make clean, this patch is wiped out
before I can do "make" out of it.  How to apply this patch in the ports?


More information about the freebsd-emulation mailing list