From nobody Mon Mar 07 19:04:09 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 2DAAB1A1F105; Mon, 7 Mar 2022 19:04:14 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com [IPv6:2607:f8b0:4864:20::f2a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KC7DF0d3Bz4jrG; Mon, 7 Mar 2022 19:04:12 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qv1-xf2a.google.com with SMTP id e22so12812278qvf.9; Mon, 07 Mar 2022 11:04:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=esZKyfTOPNgtjph7i3nXEvXHagQgF99ajXO9YY9+DcA=; b=HPCKDDbtO/a2bGU0A4jEhKuQUdPD9kptMpjYNICoshgaRLrvEEABe7WblRyU7V0wmv hYVTK4OlSAXKwbSmV6kHrz5PO4RWqM65DzeU5EJtIEUderTZ+tPfk3XrKzUQdoFJlBzz JKnunZK9u+qLpRXsn0uHEhL2B6Vp9V0wZr6ldN+ZBGaRwkqdOBOlD9Sfjp6TeRLbLBif YFS/9KvEcdJZOlzkzEUTbJcsFX6LOz98lakup6sBdviMky0n3KtWlLXyGFD2S7ZguI/G gxLB41m32vtxK0H8hr6DqaEkMx5/AhoG5VRgnEJOtheK4rxJZvpzEGz4rTQ3plmxxCWR VInw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=esZKyfTOPNgtjph7i3nXEvXHagQgF99ajXO9YY9+DcA=; b=Bhsl154d+G8oc+vqwhJauFTkBLHjLoAc/8bnpBB7y7jSIX+hxrrio6h3gwwMQULBzn o3TxD5X412AO0Qz2d7/8q4Y/U1yNfF0RHW+LW8EMKm15IsVDELPsLevTI7puitDSim9w VTS9x4zJuCOCDMwNbsXhTpP4wqSTK5wgvPEDbEWPcfSLgTRbcYAyP59j4zpVUTv9cNBH Brsm777aqvmpd8EJlKnwWvMW1CjcrbkHYQmEqXEieP1ktytZ+v2WNWrxZzmgPKexR1cV qasFVXFar9FYbdrJNcxCX59/KdHfXyYRTE76UKG3rOzh1f8uIq8BHtOU97PE0vVnh3VH /qcg== X-Gm-Message-State: AOAM532ZoCOu8lbbeL9jU9ah+CNa5xmttMFkK+C+vxjbKEIDtrgUCy44 a4W1qyEo36Me1xBGEeEAAVo= X-Google-Smtp-Source: ABdhPJy7JZKbZ0toPNLBBJU8r3OvQzaua82asbIDbzSEYtvHi8xspeNNTsqmyhjQSHaQhPS73TgJdA== X-Received: by 2002:a05:6214:d06:b0:435:6794:f929 with SMTP id 6-20020a0562140d0600b004356794f929mr9402128qvh.101.1646679852446; Mon, 07 Mar 2022 11:04:12 -0800 (PST) Received: from nuc (198-84-189-58.cpe.teksavvy.com. [198.84.189.58]) by smtp.gmail.com with ESMTPSA id x26-20020ae9f81a000000b005f1916fc61fsm6400480qkh.106.2022.03.07.11.04.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Mar 2022 11:04:11 -0800 (PST) Date: Mon, 7 Mar 2022 14:04:09 -0500 From: Mark Johnston To: Mark Millard Cc: FreeBSD-STABLE Mailing List , Andrew Turner , Ronald Klop , bob prohaska , Free BSD , freebsd-current Subject: Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28)) Message-ID: References: <1800459695.1.1646649539521@mailrelay> <132978150.92.1646660769467@mailrelay> <3374E0F8-D712-4ED0-A62B-B6924FC8A5E2@fubar.geek.nz> List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4KC7DF0d3Bz4jrG X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=HPCKDDbt; dmarc=none; spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::f2a as permitted sender) smtp.mailfrom=markjdb@gmail.com X-Spamd-Result: default: False [-2.68 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCPT_COUNT_SEVEN(0.00)[7]; FREEMAIL_TO(0.00)[yahoo.com]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-0.98)[-0.983]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::f2a:from]; MLMMJ_DEST(0.00)[freebsd-stable,freebsd-arm,freebsd-current]; MID_RHS_NOT_FQDN(0.50)[]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On Mon, Mar 07, 2022 at 10:03:51AM -0800, Mark Millard wrote: > > > On 2022-Mar-7, at 08:45, Mark Johnston wrote: > > > On Mon, Mar 07, 2022 at 04:25:22PM +0000, Andrew Turner wrote: > >> > >>> On 7 Mar 2022, at 15:13, Mark Johnston wrote: > >>> ... > >>> A (the?) problem is that the compiler is treating "pc" as an alias > >>> for x18, but the rmlock code assumes that the pcpu pointer is loaded > >>> once, as it dereferences "pc" outside of the critical section. On > >>> arm64, if a context switch occurs between the store at _rm_rlock+144 and > >>> the load at +152, and the thread is migrated to another CPU, then we'll > >>> end up using the wrong CPU ID in the rm->rm_writecpus test. > >>> > >>> I suspect the problem is unique to arm64 as its get_pcpu() > >>> implementation is different from the others in that it doesn't use > >>> volatile-qualified inline assembly. This has been the case since > >>> https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762 > >>> . > >>> > >>> I haven't been able to reproduce any crashes running poudriere in an > >>> arm64 AWS instance, though. Could you please try the patch below and > >>> confirm whether it fixes your panics? I verified that the apparent > >>> problem described above is gone with the patch. > >> > >> Alternatively (or additionally) we could do something like the following. There are only a few MI users of get_pcpu with the main place being in rm locks. > >> > >> diff --git a/sys/arm64/include/pcpu.h b/sys/arm64/include/pcpu.h > >> index 09f6361c651c..59b890e5c2ea 100644 > >> --- a/sys/arm64/include/pcpu.h > >> +++ b/sys/arm64/include/pcpu.h > >> @@ -58,7 +58,14 @@ struct pcpu; > >> > >> register struct pcpu *pcpup __asm ("x18"); > >> > >> -#define get_pcpu() pcpup > >> +static inline struct pcpu * > >> +get_pcpu(void) > >> +{ > >> + struct pcpu *pcpu; > >> + > >> + __asm __volatile("mov %0, x18" : "=&r"(pcpu)); > >> + return (pcpu); > >> +} > >> > >> static inline struct thread * > >> get_curthread(void) > > > > Indeed, I think this is probably the best solution. Thinking a bit more, even with that patch, code like this may not behave the same on arm64 as on other platforms: critical_enter(); ptr = &PCPU_GET(foo); critical_exit(); bar = *ptr; since as far as I can see the compiler may translate it to critical_enter(); critical_exit(); bar = PCPU_GET(foo); > Is this just partially reverting: > > https://cgit.freebsd.org/src/commit/?id=63c858a04d56 > > If so, there might need to be comments about why the updated > code is as it will be. > > Looks like stable/13 picked up sensitivity to the get_pcpu > details in rmlock in: > > https://cgit.freebsd.org/src/commit/?h=stable/13&id=543157870da5 > > (a 2022-03-04 commit) and stable/13 also has the get_pcpu > misdefinition in: > > https://cgit.freebsd.org/src/commit/sys/arm64/include/pcpu.h?h=stable/13&id=63c858a04d56 > > . So an MFC would be appropriate in order for aarch64 > to be reliable for any variations in get_pcpu in stable/13 > (and for 13.1 to be so as well). I reverted the rmlock commit in stable/13 already. Either get_pcpu() will be fixed shortly or 13.1 will ship without the rmlock commit.