From nobody Mon Mar 07 21:17:56 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 70AC319F828F for ; Mon, 7 Mar 2022 21:18:07 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic307-55.consmr.mail.gq1.yahoo.com (sonic307-55.consmr.mail.gq1.yahoo.com [98.137.64.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4KCBBj7092z3vvd for ; Mon, 7 Mar 2022 21:18:05 +0000 (UTC) (envelope-from marklmi@yahoo.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1646687878; bh=mCcHKlyBPNimdscd7fOFHd6k1HiIvRhwt6DQ7NzFRkI=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=nx3fLBE0zfYQStJCgBtlJ9X+9Rx8i2Sl7RgQ6DSr96zNxKQDl57ddKZcSNoJBS1ILDJIWP6x7Olte2zMIxH1N62y1HwzCapJzgWMeXM3tpdU6qpNKJiihKTgopVGPCcUg9QQihrVc+YxNWYvZ+QKE0nipmkXaWSUZSaX3wgmhJ6u93sfcbNVmzWhd2RBQmUgetPFN457AlKawo45tjl7rWgUhnDTInAerZ1kWrwocbFzXFs/GQrofn4W9Jz9i/HQhc+Mmrs7iFJ9XnPXxdvntdVrxfigsfkEI89kJSgcu/w23E6rJXG/cDRXWo3LUaPI9tW1pyLHwpJRGTjLbIRX0Q== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1646687878; bh=5yhuMu/hrdZgUGc/kWru4Er/LpCJ6hXxGsKMw5ybvs8=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=TSO0nJe5VS8xc8MOXgS8b8+AKjyYq/uheS963BpjdIrMbxv/re49c8ZAvrvXQaStq9FTcTEslQfftt4z5TGwsB+dFuNN5UIUPmvGbEE42DahdzKuYaqYAY7kE0mVvgGJ++RxpIJWmP8GgpMwxTqSRc1Ofn4M7/P5+fv3f3i6cIlme8evjjlrFwNLwwhltB1XJ0WLTGvG6hb5AP/qL+pwEJtNiT1ZpOzQ4XSs8n9pmIrlFlfHyW006YOa8nhRCdrh/ejfSbaKoVUv6WdSqJWdwyFmqj/Z9ZU1AsUWx5mamVcEz+5BszRFDFfnpy62H8gl4b+QtkMjPdHYLV24nFUETQ== X-YMail-OSG: IGkmHDkVM1mDTahsWpU94Y_ch5zSIEfGGpzxdFxx.iAUNBmgEmRy2asCQK34.1v 7AuRZ1RyJ4R2ZMvKhhRHIx6D41ZiSvT.cpsn5OYr3de99T5mr1hOXUhWntq2CPuoSIwQRCDVsCc2 T1yngpqtdwuhlSZxi7wsgC.96b3WKHjRmFtVtlBJszyGhV2djRstrrLQHZCTi4qCWlYs4jZeSIWY QbbfdVzGJ0h1RfUUYsh5KjDaQRmHaf8sIdHTIgv_vnDJP_dtAPaGvs95LA.P2ZegoIEEhAqkNBOi noasm91mMM5fIfo_0N.GvWVw4dCbfNB3Rk2dCrhOfzRxoxw6Jr3h8CzD8q0QYRw0Ke8UHKJmxKNZ 9upF1ZBi8jlmbRtoSMmJFxWJSStH3QExV_FF0poWxJ3Hx9vz2wi9J4XxYwwwiKJQJZzop_VGqb_3 HwuH_1J0htSxMNnx0FoUOGXxOwCnsZD3N_RymV9dHTilElMJbmoFCs_130Old2qYMlSFTtc0e4kS u.r5fXfvkh6LDqaf.mf2gOPdsqvvRmpIU1IP_WgpUKTKb8fuYByYmus3Nrs5QUvbY9ccbuUfnv07 VvtVrIhm4ICl7rTErdg_NhL76.gdRBSEgscgUcDnrh9Ia5Tf4.NZxcMy2zENDed2ANJ2kmxeApvd FonfnWFczQv7nKoaa5Mpbtrj7Iaj3.jI7ToNUczHXS2EZiFIS2seqBto9Rn.sVg5nk3.ITwviBQe GoQIA7uBB47RDX4NpcZdwF8u77_yBMNQ0Fw1G43YMpaZKk3eOhVErVxxiRIhPnN4k37Bi6Vf4u0u 88RQMRfGdhcQS16FrQT.bnWYFtpM0QCzMVaBdLoXiIpZPp_CjnwPWk4QWv3X62jmtzsr1xeUGroQ EZU9tfQ8LzsIo6eDDpWJaNugKRwJI7Ke47aoq8vwEftwuNXXg32HKWZ9GPUp8CNNAXr1IYAXiomA Fg4tI2cS272Kd5HPQhg40zIvTWOowe85NScANDeQMcGJn20YYTDkWMOMSSGjC0i0y5DA3qA4PXEV fLVWsDXX6KZenOtNqORkDCw8RjcxpqhtvXp6zYvd2T2bo7bWRJVvxBs8YsCYsiLT_077.1MbkqGx Rjsu7m9ojFjRSwx0G66WSQHZEg0Hv2t1ScR04knkHSZ5G8O5Y4Kb7Kxj7E10yC75gPVBHwkNKwwv 9WYUxQL0QRRZcMXwpm4YCV9v_kc.fhBMTNhxdH4Ttwuk.gWRUUypKqjiWDSThTPXyu.w9vrBnqOi zbnrUUcAJWY9XQiYaQ0QXO3B2p8DCBRXvOGIrZWlOy8.9PmQ60RJmzCYeGEAKK.uiVefZmnRQPgR dKYuzkIMCSMkPY2BDxqqYxTceTbMQuhqEV3bfrW3WV3mzSEdOmR0dz3f4XgWmFs3F.ptZ4XdpZuR nmBRRoACT3bwI5AwmQwfo9J36oiglpaB1Y_nKdy2Wo38B3m1MGBRqFi3ZZmmoBdYFtV_Eg_U83em _jq59HrwDLWwFWtjhb_cwt4zBR91hUI3R5CtOs3HKLgoFcLWEKrQPasdH3WJ_n41NpTTgF2t5_.A .8aL4oipiIEd1Z_qoNlgVvUrHDTe2SRGg6460TXoOpy6NK2gQWIa4ZoCxGGqAVozwXM6LvumEnhM uNPZ7d9T22ADTVa6RfR0xMYAm4L.arQlVK5hZ8ZEPJ_72nMvYolSQrvkswHj_MpCevTcJFVft4O7 P8hCPgOCQn68kVAVZ9p1tfdD5LIh7kWG.qZ1KJ5cjPjKvx2F4CLfBWyOh59UqV.EByaLc5cXLTbw Cz4guN_ojV3rZubBTrcGL42cCZ0RJHpB6Z2IFLXBzrZ_hWQVamhlhVGNoposuVAEiccdpbgF9rAy FXqLwdYR2ZAOmKxMNgtcXWg5VuYVLdkhbopYJyB0fTT3krVTngeU.1k3auD0btnZQngOT_JmbvPM qT3s.wAlz18g312PdNoJIa.hwjPBscs_5ty4rYoSG8QVu.Wwqwpjveo99QS59e6hQsKobgdZ_F7c jdu3qQYwQ0GNTjMA_Ev1VDsMj8lwNhaMW7NlLb59DPAcF6K0vkw4WElSIvanrXhrUgHqaYcaa5Jp WVy03bGEk4ghjXEjAFBVErgW.7XKf9snjxBztAOFvcPTKxiDOsK4_W3.I4q7WxKxvoKzsylyUwX7 9Bgxbn5N180TIyzaQtR8pvZ72RGNtKwH_9mLJGO_YLQA6ZUf2KGKxSVuiN4vj_UwlyedBkTgFUmd c_cr5dmOZXMJkljoqfmc4rrTcQW4DCTJ8FH0tuzADtx.MxyKYM7grugVJ32pugwp6Q4e_zFX1sf7 hFC1.CJYmPvpWHIIQdsFqf.Hoc6oTJi2hPW4.4DbSGRDMalhy0pd6AVihKJp9zl61RZevqcbtVYm TKADKMqVIhwaAZmfWRMnz9w-- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic307.consmr.mail.gq1.yahoo.com with HTTP; Mon, 7 Mar 2022 21:17:58 +0000 Received: by kubenode521.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID e1549565dc8fadf08c44473826d7d7bb; Mon, 07 Mar 2022 21:17:57 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28)) From: Mark Millard X-Priority: 3 (Normal) In-Reply-To: <1302689164.173.1646686466515@mailrelay> Date: Mon, 7 Mar 2022 13:17:56 -0800 Cc: Mark Johnston , bob prohaska , Free BSD , freebsd-current Content-Transfer-Encoding: quoted-printable Message-Id: References: <1800459695.1.1646649539521@mailrelay> <132978150.92.1646660769467@mailrelay> <1302689164.173.1646686466515@mailrelay> To: Ronald Klop X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Queue-Id: 4KCBBj7092z3vvd X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=nx3fLBE0; dmarc=pass (policy=reject) header.from=yahoo.com; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.64.31 as permitted sender) smtp.mailfrom=marklmi@yahoo.com X-Spamd-Result: default: False [-3.49 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; RCPT_COUNT_FIVE(0.00)[5]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; HAS_X_PRIO_THREE(0.00)[3]; NEURAL_HAM_SHORT(-1.00)[-0.997]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.99)[-0.993]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.64.31:from]; MLMMJ_DEST(0.00)[freebsd-current]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.64.31:from]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N On 2022-Mar-7, at 12:54, Ronald Klop wrote: > Van: Mark Johnston > Datum: maandag, 7 maart 2022 16:13 > Aan: Ronald Klop > CC: bob prohaska , Mark Millard = , freebsd-arm@freebsd.org, freebsd-current = > Onderwerp: Re: panic: data abort in critical section or under mutex = (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on = 14-CURRENT/aarch64 Feb 28)) >=20 > On Mon, Mar 07, 2022 at 02:46:09PM +0100, Ronald Klop wrote: > > Dear Mark Johnston, > > > > I did some binary search in the kernels and came to the conclusion = that = https://cgit.freebsd.org/src/commit/?id=3D1517b8d5a7f58897200497811de1b188= 09c07d3e still works and = https://cgit.freebsd.org/src/commit/?id=3D407c34e735b5d17e2be574808a09e6d7= 29b0a45a panics. > > > > I suspect your commit in = https://cgit.freebsd.org/src/commit/?id=3Dc84bb8cd771ce4bed58152e47a32dda4= 70bef23a. > > > > Last panic: > > > > panic: vm_fault failed: ffff00000046e708 error 1 > > cpuid =3D 1 > > time =3D 1646660058 > > KDB: stack backtrace: > > db_trace_self() at db_trace_self > > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > > vpanic() at vpanic+0x174 > > panic() at panic+0x44 > > data_abort() at data_abort+0x2e8 > > handle_el1h_sync() at handle_el1h_sync+0x10 > > --- exception, esr 0x96000004 > > _rm_rlock_debug() at _rm_rlock_debug+0x8c > > osd_get() at osd_get+0x5c > > zio_execute() at zio_execute+0xf8 > > taskqueue_run_locked() at taskqueue_run_locked+0x178 > > taskqueue_thread_loop() at taskqueue_thread_loop+0xc8 > > fork_exit() at fork_exit+0x74 > > fork_trampoline() at fork_trampoline+0x14 > > KDB: enter: panic > > [ thread pid 0 tid 100129 ] > > Stopped at kdb_enter+0x44: undefined f902011f > > db> > > > > A more recent kernel (912df91) still panics. See below. > > > > Do you have time to look into this? What can I provide in = information to help? >=20 > Hmm. So after my rmlock commits, we have the following disassembly = for > _rm_rlock() (with a few annotations added by me). Note that the pcpu > pointer is stored in register x18 by convention. >=20 > 0xffff00000046e304 <+0>: stp x29, x30, [sp, #-16]! > 0xffff00000046e308 <+4>: mov x29, sp > 0xffff00000046e30c <+8>: ldr x8, [x18] > 0xffff00000046e310 <+12>: ldr x9, [x18] > 0xffff00000046e314 <+16>: ldr x10, [x18] > 0xffff00000046e318 <+20>: cmp x9, x10 > 0xffff00000046e31c <+24>: b.ne 0xffff00000046e3cc = <_rm_rlock+200> // b.any > 0xffff00000046e320 <+28>: ldr x9, [x18] > 0xffff00000046e324 <+32>: ldrh w9, [x9, #314] > 0xffff00000046e328 <+36>: cbnz w9, 0xffff00000046e3c0 = <_rm_rlock+188> > 0xffff00000046e32c <+40>: str wzr, [x1, #32] > 0xffff00000046e330 <+44>: stp x0, x8, [x1, #16] > 0xffff00000046e334 <+48>: ldrb w9, [x0, #10] > 0xffff00000046e338 <+52>: tbz w9, #4, 0xffff00000046e358 = <_rm_rlock+84> > 0xffff00000046e33c <+56>: ldr x9, [x18] > 0xffff00000046e340 <+60>: ldr w10, [x9, #888] > 0xffff00000046e344 <+64>: add w10, w10, #0x1 > 0xffff00000046e348 <+68>: str w10, [x9, #888] > 0xffff00000046e34c <+72>: ldr x9, [x18] > 0xffff00000046e350 <+76>: ldr w9, [x9, #888] > 0xffff00000046e354 <+80>: cbz w9, 0xffff00000046e3f4 = <_rm_rlock+240> > 0xffff00000046e358 <+84>: ldr w9, [x8, #1212] > 0xffff00000046e35c <+88>: add x10, x18, #0x90 > 0xffff00000046e360 <+92>: add w9, w9, #0x1 > 0xffff00000046e364 <+96>: str w9, [x8, #1212] <------- = critical_enter > 0xffff00000046e368 <+100>: str x10, [x1, #8] > 0xffff00000046e36c <+104>: ldr x9, [x18, #144] > 0xffff00000046e370 <+108>: str x9, [x1] > 0xffff00000046e374 <+112>: str x1, [x9, #8] > 0xffff00000046e378 <+116>: str x1, [x18, #144] > 0xffff00000046e37c <+120>: ldr x9, [x18] > 0xffff00000046e380 <+124>: ldr w10, [x9, #356] > 0xffff00000046e384 <+128>: add w10, w10, #0x1 > 0xffff00000046e388 <+132>: str w10, [x9, #356] > 0xffff00000046e38c <+136>: ldr w9, [x8, #1212] > 0xffff00000046e390 <+140>: sub w9, w9, #0x1 > 0xffff00000046e394 <+144>: str w9, [x8, #1212] <------- = critical_exit > 0xffff00000046e398 <+148>: ldrb w8, [x8, #304] > 0xffff00000046e39c <+152>: ldr w9, [x18, #60] <------- = loading &pc->pc_cpuid > ... >=20 > A (the?) problem is that the compiler is treating "pc" as an alias > for x18, but the rmlock code assumes that the pcpu pointer is loaded > once, as it dereferences "pc" outside of the critical section. On > arm64, if a context switch occurs between the store at _rm_rlock+144 = and > the load at +152, and the thread is migrated to another CPU, then = we'll > end up using the wrong CPU ID in the rm->rm_writecpus test. >=20 > I suspect the problem is unique to arm64 as its get_pcpu() > implementation is different from the others in that it doesn't use > volatile-qualified inline assembly. This has been the case since > = https://cgit.freebsd.org/src/commit/?id=3D63c858a04d56529eddbddf85ad04fc8e= 99e73762 > . >=20 > I haven't been able to reproduce any crashes running poudriere in an > arm64 AWS instance, though. Could you please try the patch below and > confirm whether it fixes your panics? I verified that the apparent > problem described above is gone with the patch. >=20 > diff --git a/sys/kern/kern_rmlock.c b/sys/kern/kern_rmlock.c > index 0cdcfb8fec62..e51c25136ae0 100644 > --- a/sys/kern/kern_rmlock.c > +++ b/sys/kern/kern_rmlock.c > @@ -437,6 +437,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker = *tracker, int trylock) > { > struct thread *td =3D curthread; > struct pcpu *pc; > + int cpuid; > =20 > if (SCHEDULER_STOPPED()) > return (1); > @@ -452,6 +453,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker = *tracker, int trylock) > atomic_interrupt_fence(); > =20 > pc =3D get_pcpu(); > + cpuid =3D pc->pc_cpuid; > rm_tracker_add(pc, tracker); > sched_pin(); > =20 > @@ -463,7 +465,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker = *tracker, int trylock) > * conditional jump. > */ > if (__predict_true(0 =3D=3D (td->td_owepreempt | > - CPU_ISSET(pc->pc_cpuid, &rm->rm_writecpus)))) > + CPU_ISSET(cpuid, &rm->rm_writecpus)))) > return (1); > =20 > /* We do not have a read token and need to acquire one. */ >=20 > Hi, >=20 > This patch paniced again: > x0: ffffa00005a31500 = =20 > x1: ffffa00005a0e000 = =20 > x2: 2 = =20 > x3: ffffa00076c4e9a0 = =20 > x4: 0 = =20 > x5: e672743c8f9e5 = =20 > x6: dc89f70500ab1 > x7: 14 > x8: ffffa00005a31518 > x9: 1 > x10: ffffa00005a0e000 > x11: 0 > x12: 0 > x13: a > x14: 1013e6b85a8ecbe4 > x15: 1dce740d11a5 > x16: ffff3ea86e2434bf > x17: fffffffffffffff2 > x18: ffff0000fe661800 (g_ctx + fcf9fa54) > x19: ffffa00076c4e9a0 > x20: ffff0000fec39000 (g_ctx + fd577254) > x21: 2 > x22: ffff0000419b6090 (g_ctx + 402f42e4) > x23: ffff000000c0b137 (lockstat_enabled + 0) > x24: 100 > x25: ffff000000c0b000 (version + a0) > x26: ffff000000c0b000 (version + a0) > x27: ffff000000c0b000 (version + a0) > x28: 0 > x29: ffff0000fe661800 (g_ctx + fcf9fa54) > sp: ffff0000fe661800 > lr: ffff00000154ea50 (zio_dva_throttle + 154) > elr: ffff00000154ea80 (zio_dva_throttle + 184) > spsr: 60000045 > far: 2b753286b0b8 > panic: Unknown kernel exception 0 esr_el1 2000000 > cpuid =3D 1 > time =3D 1646685857 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > vpanic() at vpanic+0x174 > panic() at panic+0x44 > do_el1h_sync() at do_el1h_sync+0x184 > handle_el1h_sync() at handle_el1h_sync+0x10 > --- exception, esr 0x2000000 > zio_dva_throttle() at zio_dva_throttle+0x184 > zio_execute() at zio_execute+0x58 > KDB: enter: panic > [ thread pid 0 tid 100129 ] > Stopped at kdb_enter+0x44: undefined f901c11f > db> =20 Hmm. My somewhat older source code shows zio_dva_throttle as having: mutex_enter(&spa->spa_allocs[allocator].spaa_lock); avl_add(&spa->spa_allocs[allocator].spaa_tree, zio); nio =3D zio_io_to_allocate(spa, allocator); mutex_exit(&spa->spa_allocs[allocator].spaa_lock); return (nio); That might have implications if the issue is actually analogous to _rm_rlock_debug crashes in some way. =3D=3D=3D Mark Millard marklmi at yahoo.com