From nobody Fri Jun 27 16:39:33 2025 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4bTLr23fM1z6009X for ; Fri, 27 Jun 2025 16:39:46 +0000 (UTC) (envelope-from zlei@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4bTLr22zkLz3LNg; Fri, 27 Jun 2025 16:39:46 +0000 (UTC) (envelope-from zlei@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1751042386; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ku1uPdk/CL3sKiP8Y82xa6giCJYOH5NTpWjJaH4Nsf0=; b=lF0dBgJy8R38dGaYg+3PHLwKjL6ALfHs7lbrTWcE4wsf/7NVESLasWfGiLSWasBrZrkeYz BUxPyE78Ko/MAwaJPoE9MmqTLIrhryJ8k5P6NC23GaGoXlAP3CC+hSBKfHhTHkbxr7o8q4 pgqVc0EyyUgX3yg29UKu5n9WpsHwUxYiJ8CwfDewGLlHc8VorEhjoJAPE7gwpOETTWxMwe HI3ns93ixm6K6sdrV9FuK86d1e4570/xBPgv3tp+jR1CPfBT/H5UmckN67pwRzGSmlONBj lPG2pGXl+/jgQc/hvR1xYLfyDv8ic867Yv8frzsOahfxHn1VHb9t8ZFDPntmHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1751042386; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ku1uPdk/CL3sKiP8Y82xa6giCJYOH5NTpWjJaH4Nsf0=; b=qKlyohk3AWYsI8QBCL1h4XzOrjbLnkS/NWuY3PqahuKCKfxhKv8zrEEwwDNb1qD8Xy1lFL 0j+4IKcIhMAYDaQTZGGBog0i8ApMVOu+UWA3gV0QBy4Y8X/s8XoAhoOqca2+YAl5RnMSA2 e6g9uNe2f+NqPdFanFJd5ABVtc+ZQonrrzTA4SktxVMmk05ELN8/71I94MUJGLRqjftUFY ZD+C5xkwxie6sQGq8O3zlhpZRVppEDO07hjTzRtpg3dS0SjqFx+3QdKnxi2qNuiFcDriwJ qIHYdNOC5aoAEpHk5aCyeDGDHzbtDbd9iZzvR4kTQY1yqt2QoHCitpKe8Wj6iw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1751042386; a=rsa-sha256; cv=none; b=ZzlH69sJHeH4V+txRqaKX3ZSNQd7D2i3ywpvfbWVWctY5Y1jbPrDPvR7qxWQ4T2sZa1pGD hjgRhUPpQGcMbgazwido8oBw219qorPrs4r8zUmy9Fv5OhtZ5Ooqv1i7zqAqfw/DM7+OTO SFLkWViNXcSCG4QEaqkul3zkEUPP0C2R9UfraxceNNDyJkMnZY+jxw6CAqEgeogSeX0tPC VetESUB06xKd90ucLGTrkWZFDpx5RCz6H3r+orNi9H4cypbnBRaG2Cz23o9LuI2t1N0Oub uK6g7pVBiy5UmEs+ZHsqwHRmjmQfIfP5evITt/sXoSPhPg3OqdWyCqFL3DBCoQ== Received: from smtpclient.apple (unknown [IPv6:2001:19f0:6001:9db:98f0:9fe0:3545:10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: zlei/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 4bTLqz6gD5z84k; Fri, 27 Jun 2025 16:39:43 +0000 (UTC) (envelope-from zlei@FreeBSD.org) Content-Type: text/plain; charset=us-ascii List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.10\)) Subject: Re: regression: memory issues on main/arm64 over sched/runq changes From: Zhenlei Huang In-Reply-To: <28o26o81-so5r-qq79-6q6n-0q6746o7oo79@yvfgf.mnoonqbm.arg> Date: Sat, 28 Jun 2025 00:39:33 +0800 Cc: FreeBSD Current , Olivier Certner Content-Transfer-Encoding: quoted-printable Message-Id: <6A003013-415A-4594-AB04-AF5A9B2D660D@FreeBSD.org> References: <43005447-2rq0-6nn2-pnr5-4939s112npr4@yvfgf.mnoonqbm.arg> <0A01B9F5-C49C-41D8-BAB7-4378DEDBF647@FreeBSD.org> <28o26o81-so5r-qq79-6q6n-0q6746o7oo79@yvfgf.mnoonqbm.arg> To: "Bjoern A. Zeeb" X-Mailer: Apple Mail (2.3696.120.41.1.10) > On Jun 27, 2025, at 11:02 PM, Bjoern A. Zeeb = wrote: >=20 > On Wed, 25 Jun 2025, Zhenlei Huang wrote: >=20 > Hi, >=20 > I appplied olce's change from the review but it didn't make a = difference > on my arm64 and now on a tree with local changes (wifi bits, user = sapce > bits, etc). >=20 > Now I netbooted that tree on X86 hardware (an old Lenovo Laptop) and = ran > into something else (the same tree boots in a bhyve instance on a > different machine from a local disk image). >=20 > At the end of if_addgroup() I had added the following for local > debugging (really crude sorry): >=20 > ... >=20 > + atomic_thread_fence_seq_cst(); > IF_ADDR_WLOCK(ifp); > CK_STAILQ_INSERT_TAIL(&ifg->ifg_members, ifgm, ifgm_next); > CK_STAILQ_INSERT_TAIL(&ifp->if_groups, ifgl, ifgl_next); > IF_ADDR_WUNLOCK(ifp); >=20 > IFNET_WUNLOCK(); // excl unlock >=20 > if (new) > EVENTHANDLER_INVOKE(group_attach_event, ifg); > EVENTHANDLER_INVOKE(group_change_event, groupname); >=20 > + IFNET_RLOCK(); // shared, panic > + CK_STAILQ_FOREACH(ifgl, &ifp->if_groups, ifgl_next) { > + if (bz_debug_groups) if_printf(ifp, = "XXXXXXXXXXXXXXXXXXXXXXXXXXX-BZ %s:%d: ifgl %p, ifgl_group %p, ifg_group = %p\n", __func__, __LINE__, ifgl, (ifgl !=3D NULL) ? ifgl->ifgl_group : = NULL, (ifgl !=3D NULL && ifgl->ifgl_group !=3D NULL) ? = ifgl->ifgl_group->ifg_group : NULL); > + } > + IFNET_RUNLOCK(); > + > return (0); > } >=20 >=20 >=20 > You see the anotation //shared ? >=20 > I got a panic: excl->share with that. Well, I applied identical patch with you and I can repeat that panic, = but my screen freezes and the top most stack is=20 ``` _sx_slock_int() at _sx_slock_int+0x64/frame 0xff.... if_addgroup() at ..... .... device_attach() at ... ... root_bus_configure() at ... configure() at ... mi_startup() at .. ``` I've no idea what's wrong. =46rom the disassembly it appears the panic = happens just after witness_checkorder . >=20 > The excl. is the > IFNET_WLOCK(); // excl > at the top of the function after the groupname check. > But that gets unlocked before the event handler above > so how can this happen? I checked the event handlers and I think that is not relevant. >=20 > Sadly I cannot even dump or anything as the keyboard is as dead > as the rest of the laptop. Have to power cycle it hard. >=20 > Apart from the debugging I added I have no local changes in sys/net > in that tree. sys/kern seems to have no relevant changes either > (added a bus func, toggle link_elf_leak_locals default, and a printf > got an extra argument to print %d error when modules fail to load). >=20 >=20 > I'll try a plain main (hopefully tonight) on that machine too but I am > really at a loss here now that it's also happening on X86 and only for = me > and always around the same code there... >=20 > I'll also try to boot this tree from a USB pen drive or something; = not > that my problem comes in from netbooing... >=20 For the debugging purpose for ifgroup, I think you can omit the = IFNET_RLOCK, as at the moment adding group to the interface, there're no other = threads have opportunity to concurrently write to the interface. > I'll keep you posted... > /bz >=20 > --=20 > Bjoern A. Zeeb = r15:7 Best regards, Zhenlei