[vm-bhyve] Windows 2012 and 2016 servers guests would not stop
Jason Barbier
kusuriya at serversave.us
Tue Apr 23 13:57:38 UTC 2019
> On Apr 22, 2019, at 21:13, Victor Sudakov <vas at mpeks.tomsk.su> wrote:
>
> Paul Vixie wrote:
>>
>> Victor Sudakov wrote on 2019-04-22 19:43:
>> ...
>>>> And the implementation is pretty brutal:
>>>> # 'vm stopall'
>>>> # stop all bhyve instances
>>>> # note this will also stop instances not started by vm-bhyve
>>>> #
>>>> core::stopall(){
>>>> local _pids=$(pgrep -f 'bhyve:')
>>>>
>>>> echo "Shutting down all bhyve virtual machines"
>>>> killall bhyve
>>>> sleep 1
>>>> killall bhyve
>>>> wait_for_pids ${_pids}
>>>> }
>>
>> yow.
Eew no that is painful to read!
>
> To be sure, I was unable to find the above code (as is) in
> /usr/local/lib/vm-bhyve/vm-* (the vm-bhyve port 1.3.0). It may be that
> something more intelligent is happening in a more recent version, like a
> sequential shutdown. However, "kill $pid; sleep 1; kill $pid" seems to
> be still present.
>
>>
>>>>
>>>> I wonder what the effect of the second kill is,
>>>> that seems odd.
>>>
>>> Indeed.
>>
>> the first killall will cause each client OS to see a soft shutdown
>> signal. the sleep 1 gives them some time to flush their buffers. the
>> second killall says, time's up, just stop.
>>
>> i think this is worse than brutal, it's wrong. consider freebsd's own
>> work flow when trying to comply with the first soft shutdown it got:
>>
>> https://github.com/freebsd/freebsd/blob/master/sbin/reboot/reboot.c#L220
>>
>> this has bitten me more than once, because using "pageins" as a proxy
>> for "my server processes are busy trying to synchronize their user mode
>> state" is inaccurate. i think _any_ continuing I/O should be reason to
>> wait the full 60 seconds.
>
> Would it be beneficial to just hack /usr/local/lib/vm-bhyve/vm-* ?
>>
>> and so i think the "sleep 1" above should be a "sleep 65".
I would echo this and say it should probably be done in a way that you can have a sliding window, some servers and services are not very fault tolerant on their own. The example that springs to mind for me is the busy AD domain controller I manage. It takes 15 mins to flush the disk buffer, if I kill it before the buffer flushes I will have a bad day as my domain at best loses a few transactions at worst is corrupted.
More information about the freebsd-virtualization
mailing list