From nobody Tue Oct 18 11:15:28 2022 X-Original-To: freebsd-xen@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4MsB9z1wf8z4fHNG for ; Tue, 18 Oct 2022 11:15:51 +0000 (UTC) (envelope-from kpielorz_lst@tdx.co.uk) Received: from smtp.krpservers.com (smtp.krpservers.com [62.13.128.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.krpservers.com", Issuer "RapidSSL TLS DV RSA Mixed SHA256 2020 CA-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4MsB9y1VQlz413R for ; Tue, 18 Oct 2022 11:15:50 +0000 (UTC) (envelope-from kpielorz_lst@tdx.co.uk) Received: from [10.12.30.106] by smtp.krpservers.com (8.16.1/8.15.2) with ESMTPSA id 29IBFSuS057672 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Oct 2022 12:15:30 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tdx.co.uk; s=krpdkim; t=1666091730; bh=yh4q4YL5FJjHP/Oc38f05wd7CN17bYsEqz8lrnGpV5Y=; h=Date:From:To:Subject; b=gfk+zTimf4FooUV41GJXdk1ybjPy5tUPHlYuy+VsFRB+kyTNKnyJiqQlAWuSrag4H csoKrkZbca1E9aVYzsCsPeRmRml5A8PkeqxA50zrmgCZb7+IZx4IH+i5tOxzQ4ySb4 S0rrgdZQ6jNippTJEh/1WREwfeq8Qm/AVcCE61oek8lhM3Z9L+jnMOnyx4rv55yRgi 3Mi2iSew/sHb5BjWfcPhTi9Y88hAdCqPHVcAsSm9Rgyw9B0rV9Kl1aBFg5OZwCE2OY ImrAXXbJGu+rusJFNFwRr7yrmjD7EPvI034PTcl5KNKZFAmkZy4bLPC0fbZf128cCp KZ027htbcsknA== Date: Tue, 18 Oct 2022 12:15:28 +0100 From: Karl Pielorz To: =?UTF-8?Q?Roger_Pau_Monn=C3=A9?= cc: freebsd-xen@freebsd.org Subject: Re: Recently moved to XS 8.2 on new hardware - seen a couple of FreeBSD DomU lock ups? Message-ID: <91CF514F28FADFC45E157B96@[10.12.30.106]> In-Reply-To: References: <3C47F89ECDBA642ACCC7B932@[10.12.30.106]> X-Mailer: Mulberry/4.0.8 (Win32) List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-xen List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-xen@freebsd.org X-BeenThere: freebsd-xen@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Rspamd-Queue-Id: 4MsB9y1VQlz413R X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=tdx.co.uk header.s=krpdkim header.b=gfk+zTim; dmarc=pass (policy=none) header.from=tdx.co.uk; spf=pass (mx1.freebsd.org: domain of kpielorz_lst@tdx.co.uk designates 62.13.128.145 as permitted sender) smtp.mailfrom=kpielorz_lst@tdx.co.uk X-Spamd-Result: default: False [-2.49 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; NEURAL_HAM_LONG(-0.99)[-0.992]; MID_RHS_IP_LITERAL(0.50)[]; DMARC_POLICY_ALLOW(-0.50)[tdx.co.uk,none]; R_DKIM_ALLOW(-0.20)[tdx.co.uk:s=krpdkim]; R_SPF_ALLOW(-0.20)[+a:smtp.krpservers.com]; MIME_GOOD(-0.10)[text/plain]; ASN(0.00)[asn:60969, ipnet:62.13.128.0/24, country:GB]; MLMMJ_DEST(0.00)[freebsd-xen@freebsd.org]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; DKIM_TRACE(0.00)[tdx.co.uk:+]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; TO_DN_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N --On 14 October 2022 16:40 +0200 Roger Pau Monn=C3=A9 = =20 wrote: > Hello, > > Sorry, been very busy this week and forgot to reply earlier. > > Could you try to setup a watchdog in FreeBSD and see if that > triggers? So that we can get an idea of where the guest locks up. Hi - no problem / thanks for the reply... I'll give the above ago - part of the problem is not knowing which VM is=20 going to die (there are quite a few) - the second part, is the waiting=20 game... > Is also the 100% load on all CPUs, or just one? >From memory of the graphs - I think it was probably just one (I think the=20 last VM that locked was a two core VM). > If the watchdog doesn't work we can try other methods. Well, I'll get back to you when it happens again (personally - I hope it=20 doesn't happen again) - but at least I know it's not quite as much of a=20 dead end as I feared debug wise. If this does happen again - and I'm able, is there any point in doing a=20 snapshot + memory of the VM? (which is about the only thing I could think=20 of - not knowing about the watchdog stuff). -Kp