From nobody Mon Dec 27 17:31:15 2021 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4F3311912B2A for ; Mon, 27 Dec 2021 17:31:25 +0000 (UTC) (envelope-from glebius@freebsd.org) Received: from cell.glebi.us (glebi.us [162.251.186.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "cell.glebi.us", Issuer "cell.glebi.us" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JN4TQ2Wbzz3Lnh for ; Mon, 27 Dec 2021 17:31:22 +0000 (UTC) (envelope-from glebius@freebsd.org) Received: from cell.glebi.us (localhost [127.0.0.1]) by cell.glebi.us (8.16.1/8.16.1) with ESMTPS id 1BRHVFX3005042 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Mon, 27 Dec 2021 09:31:15 -0800 (PST) (envelope-from glebius@freebsd.org) Received: (from glebius@localhost) by cell.glebi.us (8.16.1/8.16.1/Submit) id 1BRHVFXq005041; Mon, 27 Dec 2021 09:31:15 -0800 (PST) (envelope-from glebius@freebsd.org) X-Authentication-Warning: cell.glebi.us: glebius set sender to glebius@freebsd.org using -f Date: Mon, 27 Dec 2021 09:31:15 -0800 From: Gleb Smirnoff To: Larry Rosenman Cc: current@freebsd.org Subject: Re: My -CURRENT crashes.... Message-ID: References: <286c830efc0e12e3e7a7e9b2ede31c28@lerctr.org> List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <286c830efc0e12e3e7a7e9b2ede31c28@lerctr.org> X-Rspamd-Queue-Id: 4JN4TQ2Wbzz3Lnh X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=softfail (mx1.freebsd.org: 162.251.186.162 is neither permitted nor denied by domain of glebius@freebsd.org) smtp.mailfrom=glebius@freebsd.org X-Spamd-Result: default: False [-1.57 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FREEFALL_USER(0.00)[glebius]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_SPAM_SHORT(0.53)[0.526]; MID_RHS_MATCH_FROM(0.00)[]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; DMARC_NA(0.00)[freebsd.org]; R_SPF_SOFTFAIL(0.00)[~all:c]; NEURAL_HAM_LONG(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:27348, ipnet:162.251.186.0/24, country:US]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On Fri, Dec 17, 2021 at 01:27:11PM -0600, Larry Rosenman wrote: L> Can someone look at the messages I posted to -CURRENT, most recent L> today, with random L> Callout(?) crashes after long (>6 hour) poudriere runs? L> L> I have core's available. I asked Larry to obtain a core with INVARIANTS and now we have one. Sharing what I've found to brainstorm. Trap happens in LIST_REMOVE() kern_timeout.c:488 because the entry doesn't have a prev pointer, e.g. doesn't belong to any list. #6 0xffffffff807be075 in trap_pfault (frame=0xfffffe02d3393d50, usermode=false, signo=, ucode=) at /usr/src/sys/amd64/amd64/trap.c:765 #7 #8 0xffffffff804e5609 in callout_process (now=now@entry=100465191785818) at /usr/src/sys/kern/kern_timeout.c:488 #9 0xffffffff80460fc5 in handleevents (now=now@entry=100465191785818, fake=fake@entry=0) at /usr/src/sys/kern/kern_clocksource.c:213 #10 0xffffffff80461a66 in timercb (et=0xffffffff80d47980 , arg=) at /usr/src/sys/kern/kern_clocksource.c:357 #11 0xffffffff807e6beb in lapic_handle_timer (frame=0xfffffe02d3393f40) at /usr/src/sys/x86/x86/local_apic.c:1364 (kgdb) p *tmp $13 = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_precision = 0, c_arg = 0x0, c_func = 0x0, c_lock = 0xfffff8030521e670, c_flags = 0, c_iflags = 0, c_cpu = 0} Useful here is the c_lock, which points into "process lock" lockobject. This allows us to deduct that the callout belongs to proc subsystem and we can retrieve the proc it points to: c_lock - 0x128 = 0xfffff8030521e548 It is ccache in PRS_NORMAL state. And the "tmp" in our stack frame is its p_itcallout. So there is something that would zero out most of the p_itcallout while it is scheduled? -- Gleb Smirnoff