Re: Recently moved to XS 8.2 on new hardware - seen a couple of FreeBSD DomU lock ups?

From: Roger Pau Monné <roger.pau_at_citrix.com>
Date: Mon, 17 Oct 2022 08:48:21 UTC
On Fri, Oct 14, 2022 at 04:40:17PM +0200, Roger Pau Monné wrote:
> On Thu, Oct 13, 2022 at 11:00:17AM +0100, Karl Pielorz wrote:
> > 
> > Hi all,
> > 
> > We've been running FreeBSD as a DomU under Xen Server for years now, and
> > only really had a few now 'known' issues (e.g. with networking).
> > 
> > We've recently setup XS 8.2 on some new Dell servers, and whilst everything
> > appears fine - twice now in a couple of weeks we've had FreeBSD DomU's just
> > "lock up".
> > 
> > There's no errors logged, no kernel panic, nothing - they just "stop". Xen
> > reckons the CPU is pegged at 100% (with brief periods of zero) - the console
> > is still 'available' (but locked up) - and the kernel is dead (i.e. you
> > cannot ping it).
> > 
> > Aside from the "Has anyone else seen similar" - with nothing in the logs, no
> > panic, nothing - I'm kind of at a loss as to how best to troubleshoot this
> > further?
> > 
> > The VM's are lightly loaded - haven't run out of resources (RAM /CPU etc.) -
> > and it's only happened a couple of times now (but in a couple of weeks) -
> > whereas our setup before never experienced this in it's lifetime.
> > 
> > One was a legacy 11.4 system (amd64) - the other was a 12.3 system (amd64).
> > 
> > A forced reboot brings them back (with some file system damage - as you'd
> > expect from a crash).
> > 
> > Just at a loss as to where to look - given the lack of any panic/errors etc.
> > Any suggestions?
> 
> Hello,
> 
> Sorry, been very busy this week and forgot to reply earlier.
> 
> Could you try to setup a watchdog in FreeBSD and see if that
> triggers?  So that we can get an idea of where the guest locks up.
> 
> Is also the 100% load on all CPUs, or just one?
> 
> If the watchdog doesn't work we can try other methods.

Forgot to mention, you can also try to send an NMI to the guest when
it's frozen and see if that generates a trace.

From XenServer host (dom0) command line:

# xl list # get domain id
# xl xl trigger <domid> nmi <vcpu>

If you don't know which vCPU is stuck you can always use vCPU 0 and
hopefully FreeBSD will print a trace for all the vCPUs before shutting
down as a result of the unexpected NMI.

Regards, Roger.