From nobody Mon May 30 14:15:43 2022 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id EF2EA1B57543 for ; Mon, 30 May 2022 14:15:57 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk1-x72f.google.com (mail-qk1-x72f.google.com [IPv6:2607:f8b0:4864:20::72f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4LBcrp2H51z3qWD for ; Mon, 30 May 2022 14:15:54 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk1-x72f.google.com with SMTP id 199so1092506qkk.0 for ; Mon, 30 May 2022 07:15:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=VAMEilGBr2FIFaqLB3vQWkQ0EyDvcftYN86dYIJmVFs=; b=qRTZUqjOZf4uQ/yT3sp0v/x94wY4v8zbSp9CdtFbX0viWfjDjfqXJ2cAE2A18qSw4c nJcBl6YbKjqOUxfDM+72xyNI3LLzrcL+PBG5eifcqjag8zUqdBbJGbe3jTsRiENAqGj1 sGrck6jKqO3rselqB7acsZpVdcmUAUwq3S23++K//6sEkAfwOE2DfgTYrX7x8XUK2hG1 lvNfO7K77yKpCzxnGDupT0mYfq+tp7GpGSmNFexZ9wGOpNO+yKYwhoFlMc2DI/erYD2r Ari1yW0q20dlNEHbK6hFpn5wW7xH3CsEfJYwOoIsFUkofkuiEMTyAx/2AnKyCzaCH8pR CGiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=VAMEilGBr2FIFaqLB3vQWkQ0EyDvcftYN86dYIJmVFs=; b=6TxcvErfLGi+WuU+ulzXiW0vVtjU5JXHEGAgV3RHCk8FIdp7mqMgAboqjkbBHj+9ci Ly42poUJDiJzy2Jq0H6kIvjE96jqs3uzKEtlB02p8NfSAjv4UOtHBUJ6hMXTl1oVbGvo YIrfmnIXG8RQZc71DUBzRZ9DYj3pImTv01N7SejmBwlPPwPbybzty2/h0XtBEfPh67UG OD7JFH3kkcRhet5hN/zimg91fkpu0El0nChRFu6SeNBTVZmZnf1aMD4sCDuTsMpPFM/N FhGEiG3Ena3r8sne5y3FLTwW8ahh7tvIiWr0dy1m7Y6tGhAYBZ/gL/C91SM2onL9tPjz E80Q== X-Gm-Message-State: AOAM532S9wjIhnS0qDiOZjzqPkMtwZSe1Gzu3UaQbUrjudCz/rtgDaLh /vjkK+Vk4s3lYfsZPqhBmn5tfUWgpQY= X-Google-Smtp-Source: ABdhPJziHBgSbazJwoZNMwJCzMpid7j48fs5H7sViPlxjikhoI8ofmYoKHMYOYwUnKZztOqQtGGnbQ== X-Received: by 2002:a37:a1d0:0:b0:6a3:647a:675e with SMTP id k199-20020a37a1d0000000b006a3647a675emr29766980qke.399.1653920147594; Mon, 30 May 2022 07:15:47 -0700 (PDT) Received: from nuc (198-84-189-58.cpe.teksavvy.com. [198.84.189.58]) by smtp.gmail.com with ESMTPSA id k10-20020a05620a142a00b0069fc13ce235sm7612147qkj.102.2022.05.30.07.15.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 May 2022 07:15:46 -0700 (PDT) Date: Mon, 30 May 2022 10:15:43 -0400 From: Mark Johnston To: Paul Floyd Cc: FreeBSD Hackers Subject: Re: Hang ast / pipelk / piperd Message-ID: References: <84015bf9-8504-1c3c-0ba5-58d0d7824843@gmail.com> List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4LBcrp2H51z3qWD X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=qRTZUqjO; dmarc=none; spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::72f as permitted sender) smtp.mailfrom=markjdb@gmail.com X-Spamd-Result: default: False [-2.69 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.99)[-0.994]; FREEMAIL_TO(0.00)[gmail.com]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.996]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::72f:from]; MLMMJ_DEST(0.00)[freebsd-hackers]; MID_RHS_NOT_FQDN(0.50)[]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On Mon, May 30, 2022 at 12:19:15AM +0200, Paul Floyd wrote: > > On 5/27/22 22:13, Paul Floyd wrote: > > > > Hi > > > > I'm debugging two issues with Valgrind on FreeBSD 13.1 and 14, one on > > amd64 and one on i386. > > > ... > > |Both hangs seem quite sensitive to timing - in both cases adding or > > changing nanosleep times seem to make them no longer hang. | > > |Adding debug statements to Valgrind can also change the behaviour > > (and is also unsafe when not holding the scheduler lock). Does this > > look like a kernel bug? | > > [...] > > Under gdb I see (and this is quite variable) > > (gdb) info thread > Id Target Id Frame > * 1 LWP 100073 of process 861 vgModuleLocal_do_syscall_for_client_WRK > () at m_syswrap/syscall-amd64-freebsd.S:135 > 2 LWP 100215 of process 861 > vgModuleLocal_do_syscall_for_client_WRK () at > m_syswrap/syscall-amd64-freebsd.S:135 > 3 LWP 100216 of process 861 0x00000000380bffac in do_syscall_WRK () > 4 LWP 100217 of process 861 0x00000000380bffac in do_syscall_WRK () > 5 LWP 100218 of process 861 0x00000000380bffac in do_syscall_WRK () > 6 LWP 100219 of process 861 0x00000000380bffac in do_syscall_WRK () > 7 LWP 100220 of process 861 0x00000000380bffac in do_syscall_WRK () > 8 LWP 100221 of process 861 0x00000000380bffac in do_syscall_WRK () > 9 LWP 100222 of process 861 0x00000000380bffac in do_syscall_WRK () > 10 LWP 100223 of process 861 0x00000000380bffac in do_syscall_WRK () > 11 LWP 100224 of process 861 0x00000000380bffac in do_syscall_WRK () > 12 LWP 100225 of process 861 0x00000000380bffac in do_syscall_WRK () > 13 LWP 100226 of process 861 0x00000000380bffac in do_syscall_WRK () > 14 LWP 100227 of process 861 0x00000000380bffac in do_syscall_WRK () > 15 LWP 100228 of process 861 0x00000000380bffac in do_syscall_WRK () > > do_syscall_WRK is the syscall interface for the Valgrind host, and that > will be the threads waiting for the lock. > > Thread 1 and 2 are in do_syscall_for_client, the interface for guest > syscalls. Thread 1 is doing a _umtx_op syscall, for the pthread_join. > Thrread 2 is doing a nanosleep. These are blocking syscalls so they > release the lock before making the syscall to allow other threads to > execute. > > I think that in the snapshot above, the lock is released and one > of threads 3 to 15 should be obtaining the lock and running. > > That's where the kernel context switch / AST seems to be going wrong. > > I don't see anything particularly wrong on the Valgrind side. > > Any ideas what I can do to see why the context switch is hanging? "procstat -kk " might help to reveal what's going on, since it sounds like the hand/livelock is happening somewhere in the kernel.