From nobody Mon Jun 26 15:59:14 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QqXbT194wz4k1Xs for ; Mon, 26 Jun 2023 15:59:33 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic311-24.consmr.mail.gq1.yahoo.com (sonic311-24.consmr.mail.gq1.yahoo.com [98.137.65.205]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4QqXbS4hmRz3q55 for ; Mon, 26 Jun 2023 15:59:32 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1687795170; bh=k4v3nS5LiR7Z6em8a4MLZD+gby7TyxEZyaGvWT3ZHys=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=WbHCH3Ms5jQ52ZrQ+8MTPzn4FpuG7gdg1QcNpZCCdPXCbGzJ9sGrv4m/AWKC+a1lOMPOrLzoGPZI8fNtBfAhVwRcx6zwYQawQeS0P1e+1BncuGN/G1ccqHFoBwYLYmx1Ducmg1wlrXezqhxILdFCGVgT1UV49lhyh9gX+BvgnoR510TXk0+bcqdsItAMGI5G/vDJf6N6OPTcObOTVtFaQABfJONe1prL5UA77sv3ordIZr1A4vpmW3Srg+6jVIoxxBDD+Pne1Aqdf+9Z2rlLtnX1z2XExltVrJ0BUHGQz3epX4U4GYlC+zgePK4fnzJ6SEIQ6TBd18J3QyQpPz5wVg== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1687795170; bh=KnNHHq2Aw47+DcZTjIW8+j4D2q6+WNC2p+s39Azbvox=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=BSBa65kXAHSxfLP4IVtnGSOua0p1HimIVe4TngfQUFtb3n7dxXd+7tqNUYiNEeGB94etWMevCaqux0+OKgmz+VVjaBQ/0ugMz6Za9socsEzN4SOUZZupL2rNDx26Lu68Yjhw0etsP8wGgIjQBfWdkJxnf/hWIJoZhQEBN6LqHme84XUNp4V05icansBcSouKDLtsUQBs7bIAWM9qCLNAmDBzOG8+zrmEX1/NdpBVhnFntG59PKx85TXhxtFxLRai7Gx3mtdEQNBMeLOWLFb/eMYs2Ji0+1aXC0K1OGL+iMtEo2BZKl3RtfDeG8xk9HSdZ+EK4emu0N1LNMJQlewjzw== X-YMail-OSG: kzGb8t8VM1lEhLd5fj68xqIrPXEeSAqDNo91vCpOGS_lj5qWfrt.mO4it_jYeEE V.eTXeMXmVOT6EqcnGcjIfrgU1o80AD3CWVfXQhU4sDcTX_PKTvworyTfyZgSr6IQ_C_AfQ_5Cb2 m7x3ewAW_cnzkGoVHdsWd3GzrssYWe9BJf_RZd5q7gMJ54Lw2SPgQAPHTnXE324FxwqQ3_WtAwxL IwFc9pU3z66_sIrlEsBg9UuPQZYlViXBubizD1ul7LqotShAY40ZiRf6Qgsc1DI6ymfxK3SDBnTE rQ5JYSRVcFT_t2RB7yVhvrq595EgKrzMg4as9ypJdoC5fupEA.SAVLF0bzgM9pa.o2iEim61kJgm 3VN5QgrL7Dvjx.LR0MItphXsQqWHpg4wKnkUusUzOx1lH07ibe.C3a67bfpWkaHYWYdwP7cIUXxK y_IXMh40bvi36GgI9Tqw7Nvb6D0wCzrd1A1pVVif8cBFNVP8.VPl5Sl_vjsNZwKcPUAM4mSuU5f7 OmuErtBAvQMCjwk3cv2Nl6A3o8quNYG_6coEh_TyH.nhoE.c6vHclt3I6_A1R6p01o6T1kQfSK93 RPPsJI2h46.Bo7bSIVDuoHuDsPjBho9PwGx8AhOz6qi0ZQ1rQSVncQYweh7l66A_t2befwXJnPF7 fE4jtLSJ2x_xhCKNt.4134N8EJf_pvNXCfONCLmN1U490pOvb.n5guY3NQDjTACSJFbh1FuvFnp_ oLxV7nr8UlR7GbPsasS7UcYv9yLpZo3ZZCVfME0gHWU45ogr6YGL9d5DAyHHTgKGNlEuPaBc.cI3 LfuANBGBqTy_3JddYtNJFHZAtEwYzK21D6DX_dd5onk_E8nLQIQ.dWZP6lu1T98ZRLCdeapwmNXq 7LnwUdnBgsNwj_3DZoQqkqqRaMZRhM3iEEwsaHArx_8v5bIbIqel1u9hJEhT.vS_wmQnyHuOXIoE TQ7JNgbFq.8_EVch.gGAzFuZy2nNaXsdzJvf8fYQR1bxTszHrmqSQycTUkR.ePiUSPf6EzHey3mw 8IpUncdw.zwxQg201cF9vNIDvZry4dsWzHrAPPa6P0vi3ljHXpR9hYEiP.dwQNvj2srZcxx_YEwl ipcanSCi.kIUaHcPaZbfqYXyp88sxyNjrZQP3Y34caL6yjrICcu_w4WScv6T7d1l4XxRI9n90NG2 GloiZpj9KW3JYeNRAbRFy_Fz3ZXTlHovX5wfL272DBlWnvpTWqalqmEApSM3OGb8rJBuBSBVOR5Z Y2e2hqM0wUUT6dyojvtrdcLjXxEvIr1tFymOYSRbAii3q5rsMvzTEy5ey2yq_dTb8JIv2T3dpkss vy6EGfpXr9PMxQpA1VY_Y6jR_2qL84dr5D4pEgGn.cQZdHwvJr5FTwP3AHwbPfKfaUvxxNlYQSkf 9j45B15eXNqHHC5sst1M6PwjzVh4MkybJrLjB0GGiRsn8Ue.Thj8x89MDluDxI2K2u21wSPaRcOU Jkh7ge4l_GUcOgxyx2AJ9iBOM7OvmWmWpg7tNQMEN6NRNG_RQq5BAbBYq30iJl6Q8XUgScAUnrgX ax2uupIgI1Uk.c8qkOakkr8hTBtT2UJTXFhzaEaBdq86kuDYtUnU9Aaf.OThCpx3iEQGZlKrDeOB FjVacMTPDG4AjTZjFqWVn.34YbsJSDYP5Y1DxVBgESq5JRADEbtZWWvTZb9v7me_b6SWv9KiXzPK QkRvp0pMiVJFAuf6RuM6eaHYy7dwJkumZp1OS8NoeYUNW91_OR8iKHjGnGHyO7ZZ2ZCjrLt1aA8M aEotBmo9JnRE2DBM9QnTdFVeveSxlPPF0FUHNZXVNPWn12XTafqMo5LcVaDJRX1uFTmQrBlT8Iiy nzqkRlRNwkZh9uwYBrlgVBbZY9ZVRN8rkFbt9uLFGh4VdWIJ7htSFErhwLW9_5w5L0Hx9oMhnN.0 9I677zRbh0pIjnCn2hFgs1.ylkcP5Z4CNO0LuM8lRiRWQ2bYrgyeb8bilMWEUwqcNiJEE_El2gv_ YkCLsm6DdKFWZs1pzadkXsm8.wJ2A.5iohyB7lZNHOlbmjkfT0QL0O4LXYgaeZ4OePt0wWEC8J2b 7XIF2YnoHI38ZOh9vEZR3bLgOaLw8d8IsLuZjzav.CYGyO5t815cL72S4uyva5z3xb.0oI8qKoV2 9VmFh2754.p6AxKz.Cb6DVnm4rrz9IQuBu2MjDMyGizoUGcwtzUcBVI5ovLj.A6uDCTR0k8S2fa. pXkn1mPMvH1yqc8cyJJzgQAuDEXU3usXfXvSn7GdaGTBnH57UI9E9T2y366az7hQOXr3R3bG59GQ - X-Sonic-MF: X-Sonic-ID: 90f205a6-5b37-4986-9e50-fc4a616a13a5 Received: from sonic.gate.mail.ne1.yahoo.com by sonic311.consmr.mail.gq1.yahoo.com with HTTP; Mon, 26 Jun 2023 15:59:30 +0000 Received: by hermes--production-ne1-574d4b7954-c659b (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID aa70eed5f1762368987f66b78490f380; Mon, 26 Jun 2023 15:59:26 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Subject: Re: aarch64 main-n263493-4e8d558c9d1c-dirty (so: 2023-Jun-10) Kyuafile run: "Fatal data abort" crash during vnet_register_sysinit From: Mark Millard In-Reply-To: <2E9684B7-9359-4A3D-A0C2-C1D2B221F2C4@mit.edu> Date: Mon, 26 Jun 2023 08:59:14 -0700 Cc: Current FreeBSD , freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <79849041-5E0E-4244-9BA7-F7F1C673F31F@yahoo.com> References: <3FD359F8-CFCC-400F-B6DE-B635B747DE7F.ref@yahoo.com> <3FD359F8-CFCC-400F-B6DE-B635B747DE7F@yahoo.com> <4A380699-7C9E-4E2E-8DCD-F9ECC2112667@yahoo.com> <64F18C76-BD2A-4608-A8CC-38AC2820FC12@yahoo.com> <2E9684B7-9359-4A3D-A0C2-C1D2B221F2C4@mit.edu> To: John F Carr X-Mailer: Apple Mail (2.3731.600.7) X-Rspamd-Queue-Id: 4QqXbS4hmRz3q55 X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On Jun 26, 2023, at 07:29, John F Carr wrote: >=20 >=20 >> On Jun 26, 2023, at 04:32, Mark Millard wrote: >>=20 >> On Jun 24, 2023, at 17:25, Mark Millard wrote: >>=20 >>> On Jun 24, 2023, at 14:26, John F Carr wrote: >>>=20 >>>>=20 >>>>> On Jun 24, 2023, at 13:00, Mark Millard wrote: >>>>>=20 >>>>> The running system build is a non-debug build (but >>>>> with symbols not stripped). >>>>>=20 >>>>> The HoneyComb's console log shows: >>>>>=20 >>>>> . . . >>>>> GEOM_STRIPE: Device stripe.IMfBZr destroyed. >>>>> GEOM_NOP: Device md0.nop created. >>>>> g_vfs_done():md0.nop[READ(offset=3D5885952, length=3D8192)]error =3D= 5 >>>>> GEOM_NOP: Device md0.nop removed. >>>>> GEOM_NOP: Device md0.nop created. >>>>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D= 5 >>>>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D= 5 >>>>> GEOM_NOP: Device md0.nop removed. >>>>> GEOM_NOP: Device md0.nop created. >>>>> GEOM_NOP: Device md0.nop removed. >>>>> Fatal data abort: >>>>> x0: ffffa02506e64400 >>>>> x1: ffff0001ea401880 (g_raid3_post_sync + 3a145f8) >>>>> x2: 4b >>>>> x3: a343932b0b22fb30 >>>>> x4: 0 >>>>> x5: 3310b0d062d0e1d >>>>> x6: 1d0e2d060d0b3103 >>>>> x7: 0 >>>>> x8: ea325df8 >>>>> x9: ffff0001eec946d0 ($d.6 + 0) >>>>> x10: ffff0001ea401880 (g_raid3_post_sync + 3a145f8) >>>>> x11: 0 >>>>> x12: 0 >>>>> x13: ffff000000cd8960 (lock_class_mtx_sleep + 0) >>>>> x14: 0 >>>>> x15: ffffa02506e64405 >>>>> x16: ffff0001eec94860 (_DYNAMIC + 160) >>>>> x17: ffff00000063a450 (ifc_attach_cloner + 0) >>>>> x18: ffff0001eb290400 (g_raid3_post_sync + 48a3178) >>>>> x19: ffff0001eec94600 (vnet_epair_init_vnet_init + 0) >>>>> x20: ffff000000fa5b68 (vnet_sysinit_sxlock + 18) >>>>> x21: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>>>> x22: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>>>> x23: ffffa0000042e500 >>>>> x24: ffffa0000042e500 >>>>> x25: ffff000000ce0788 (linker_lookup_set_desc + 0) >>>>> x26: ffffa0203cdef780 >>>>> x27: ffff0001eec94698 = (__set_sysinit_set_sym_if_epairmodule_sys_init + 0) >>>>> x28: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>>>> x29: ffff0001eb290430 (g_raid3_post_sync + 48a31a8) >>>>> sp: ffff0001eb290400 >>>>> lr: ffff0001eec82a4c ($x.1 + 3c) >>>>> elr: ffff0001eec82a60 ($x.1 + 50) >>>>> spsr: 60000045 >>>>> far: ffff0002d8fba4c8 >>>>> esr: 96000046 >>>>> panic: vm_fault failed: ffff0001eec82a60 error 1 >>>>> cpuid =3D 14 >>>>> time =3D 1687625470 >>>>> KDB: stack backtrace: >>>>> db_trace_self() at db_trace_self >>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >>>>> vpanic() at vpanic+0x13c >>>>> panic() at panic+0x44 >>>>> data_abort() at data_abort+0x2fc >>>>> handle_el1h_sync() at handle_el1h_sync+0x14 >>>>> --- exception, esr 0x96000046 >>>>> $x.1() at $x.1+0x50 >>>>> vnet_register_sysinit() at vnet_register_sysinit+0x114 >>>>> linker_load_module() at linker_load_module+0xae4 >>>>> kern_kldload() at kern_kldload+0xfc >>>>> sys_kldload() at sys_kldload+0x60 >>>>> do_el0_sync() at do_el0_sync+0x608 >>>>> handle_el0_sync() at handle_el0_sync+0x44 >>>>> --- exception, esr 0x56000000 >>>>> KDB: enter: panic >>>>> [ thread pid 70419 tid 101003 ] >>>>> Stopped at kdb_enter+0x44: str xzr, [x19, #3200] >>>>> db>=20 >>>>=20 >>>> The failure appears to be initializing module if_epair. >>>=20 >>> Yep: trying: >>>=20 >>> # kldload if_epair.ko >>>=20 >>> was enough to cause the crash. (Just a HoneyComb context at >>> that point.) >>>=20 >>> I tried media dd'd from the recent main snapshot, booting the >>> same system. No crash. I moved my build boot media to some >>> other systems and tested them: crashes. I tried my boot media >>> built optimized for Cortex-A53 or Cortex-X1C/Cortex-A78C >>> instead of Cortex-A72: no crashes. (But only one system can >>> use the X1C/A78C code in that build.) >>>=20 >>> So variation testing only gets the crashes for my builds >>> that are code-optimized for Cortex-A72's. The same source >>> tree vintage built for cortex-53 or Cortex-X1C/Cortex-A78C >>> optimization does not get the crashes. But I also >>> demonstrated an optmized for Cortex-A72 build from 2023-Mar >>> that gets the crash. >>>=20 >>> The last time I ran into one of these "crashes tied to >>> cortex-a72 code optimization" examples it turned out to be >>> some missing memory-model management code in FreeBSD's USB >>> code. But being lucky enough to help identify a FreeBSD >>> source code problem again seems not that likely. It could >>> easily be a code generation error by clang for all I know. >>>=20 >>> So, unless at some point I produce fairly solid evidence >>> that the code actually running is messed up by FreeBSD >>> source code, this should likely be treated as "blame the >>> operator" and should likely be largely ignored as things >>> are. (Just My Problem, as I want the Cortex-A72 optimized >>> builds.) >>=20 >> Turns out that the source code in question is the >> assignment to V_epair_cloner below: >>=20 >> static void >> vnet_epair_init(const void *unused __unused) >> { >> struct if_clone_addreq req =3D { >> .match_f =3D epair_clone_match, >> .create_f =3D epair_clone_create, >> .destroy_f =3D epair_clone_destroy, >> }; >> V_epair_cloner =3D ifc_attach_cloner(epairname, &req); >> } >> VNET_SYSINIT(vnet_epair_init, SI_SUB_PSEUDO, SI_ORDER_ANY, >> vnet_epair_init, NULL); >>=20 >> Example code when not optimizing for the Cortex-A72: >>=20 >> 11a4c: d0000089 adrp x9, 0x23000 >> 11a50: f9400248 ldr x8, [x18] >> 11a54: f942c508 ldr x8, [x8, #1416] >> 11a58: f943d929 ldr x9, [x9, #1968] >> 11a5c: a9437bfd ldp x29, x30, [sp, #48] >> 11a60: f9401508 ldr x8, [x8, #40] >> 11a64: f8296900 str x0, [x8, x9] >>=20 >> The code when optmizing for the Cortex-A72: >>=20 >> 11a4c: f9400248 ldr x8, [x18] >> 11a50: f942c508 ldr x8, [x8, #1416] >> 11a54: d503201f nop >> 11a58: 1008e3c9 adr x9, #72824 >> 11a5c: f9401508 ldr x8, [x8, #40] >> 11a60: f8296900 str x0, [x8, x9] >> 11a64: a9437bfd ldp x29, x30, [sp, #48] >>=20 >> It is the "str x0, [x8, x9]" that vm_fault's for >> the optimized code. >>=20 >> So: >>=20 >> 11a4c: d0000089 adrp x9, 0x23000 >> 11a58: f943d929 ldr x9, [x9, #1968] >>=20 >> was optimized via replacement by: >>=20 >> 11a58: 1008e3c9 adr x9, #72824 >>=20 >> I.e., the optimization is based on the offset from >> the instruction being fixed in order to produce the >> value in x9, even if the instruction is relocated. >>=20 >> This resulted in the specific x9 value shown in >> the x8/x9 pair: >>=20 >> x8: ea325df8 >> x9: ffff0001eec946d0 >>=20 >> which total's to the fault address (value >> in far): >>=20 >> far: ffff0002d8fba4c8 >>=20 >>=20 > Is this the same as bug 264094? >=20 > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D264094 Well, the not Cortex-A72 optimized .o stage code vs. the Cortex-A72 optimized .o stage code looks like: (not Cortex-A72 optimized) 3c: 90000009 adrp x9, 0x0 40: f9400248 ldr x8, [x18] 44: f942c508 ldr x8, [x8, #1416] 48: f9400129 ldr x9, [x9] 4c: a9437bfd ldp x29, x30, [sp, #48] 50: f9401508 ldr x8, [x8, #40] 54: f8296900 str x0, [x8, x9] vs. (Cortex-A72 optimized) 3c: f9400248 ldr x8, [x18] 40: f942c508 ldr x8, [x8, #1416] 44: 90000009 adrp x9, 0x0 48: f9400129 ldr x9, [x9] 4c: f9401508 ldr x8, [x8, #40] 50: f8296900 str x0, [x8, x9] 54: a9437bfd ldp x29, x30, [sp, #48] (The x29 lines have a different purpose but I show the sequencing as shown by objdump to show that it is basically an ordering difference at the .o stage.) As for if_epair.kld production the .meta files show: CMD ld -m aarch64elf -warn-common --build-id=3Dsha1 -r -o if_epair.kld = if_epair.o CMD ctfmerge -L VERSION -g -o if_epair.kld if_epair.o CMD :> export_syms CMD awk -f /usr/main-src/sys/conf/kmod_syms.awk if_epair.kld = export_syms | xargs -J% objcopy % if_epair.kld CWD = /usr/obj/BUILDs/main-CA72-nodbg-clang-alt/usr/main-src/arm64.aarch64/sys/G= ENERIC-NODBG-CA72/modules/usr/main-src/sys/modules/if_epair vs. CMD ld -m aarch64elf -warn-common --build-id=3Dsha1 -r -o if_epair.kld = if_epair.o CMD ctfmerge -L VERSION -g -o if_epair.kld if_epair.o CMD :> export_syms CMD awk -f /usr/main-src/sys/conf/kmod_syms.awk if_epair.kld = export_syms | xargs -J% objcopy % if_epair.kld CWD = /usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENER= IC-NODBG-CA72/modules/usr/main-src/sys/modules/if_epair It looks to me like the code ordering differences in the .o's may be all that lead to the differing .kld results for setting x9 . If so, it is not good to be that dependent on minor .o stage code generation differences for if things will be operational vs. not. =3D=3D=3D Mark Millard marklmi at yahoo.com