Re: Recently moved to XS 8.2 on new hardware - seen a couple of FreeBSD DomU lock ups?

From: Roger Pau Monné <roger.pau_at_citrix.com>
Date: Tue, 18 Oct 2022 13:32:13 UTC
On Tue, Oct 18, 2022 at 12:15:28PM +0100, Karl Pielorz wrote:
> 
> 
> --On 14 October 2022 16:40 +0200 Roger Pau Monné <roger.pau@citrix.com>
> wrote:
> 
> > Hello,
> > 
> > Sorry, been very busy this week and forgot to reply earlier.
> > 
> > Could you try to setup a watchdog in FreeBSD and see if that
> > triggers?  So that we can get an idea of where the guest locks up.
> 
> Hi - no problem / thanks for the reply...
> 
> I'll give the above ago - part of the problem is not knowing which VM is
> going to die (there are quite a few) - the second part, is the waiting
> game...
> 
> > Is also the 100% load on all CPUs, or just one?
> 
> > From memory of the graphs - I think it was probably just one (I think
> > the
> last VM that locked was a two core VM).
> 
> > If the watchdog doesn't work we can try other methods.
> 
> Well, I'll get back to you when it happens again (personally - I hope it
> doesn't happen again) - but at least I know it's not quite as much of a dead
> end as I feared debug wise.
> 
> If this does happen again - and I'm able, is there any point in doing a
> snapshot + memory of the VM? (which is about the only thing I could think of
> - not knowing about the watchdog stuff).

You could try to get a snapshot, albeit I'm not sure if that will work
correctly if the VM is wedged.

Since it might not be feasible to setup the watchdog on all VMs my
recommendation would be to try sending an NMI to the stuck processor,
and see if we can get a trace that way, see:

https://lists.freebsd.org/archives/freebsd-xen/2022-October/000125.html

That seems to get me a trace when used on a non-locked up VM, so it's
worth a try.

Thanks, Roger.