From nobody Mon Dec 27 18:43:01 2021 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 908991921F08 for ; Mon, 27 Dec 2021 18:43:03 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JN6473Spbz3qt0; Mon, 27 Dec 2021 18:43:03 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: by mail-qt1-x82e.google.com with SMTP id p19so14178949qtw.12; Mon, 27 Dec 2021 10:43:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=nA1ECNWTG9BzpJTVNoL7EeuxvmcjeVX9XBpfWWrlZ8Q=; b=qAs7pEqNbJPVBtpP//0kF/ErbPlEXqnB6wQwsUWFfyHvgxDwhqdIZJ1hhx/Gcf2jPW iNK8lecMQiTmo3AxEeI13bWnMRy/HWacn5DRED1kl2SqHey3Zih/WikHF3MmFIx94u3j gKLJJEagnTEOa6k0bOH/IF5XbCkyVgiv2YQV2MlmlikW/OU/kqgCqYGOuAU5UZan8HTD Q3M+FlHCEiGH5VE9mH09hTCaayhJogwj/TV5Sp14/Z//on9SlpqCKNr5dY3BkFFifLk7 Ig7dOkBgudU8UB/wKzUIsRPdp0t0rODm906J8yxBBb4fwvMDl9A5e51jhf33oawmazkv 5psw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=nA1ECNWTG9BzpJTVNoL7EeuxvmcjeVX9XBpfWWrlZ8Q=; b=oTHSMfIkT/6cP0DrBaUSXgFacE7dXfW56cFadmagi96ZlBv27mfLAH7Gsjf3W2klzG O1SAJcNkWE37G2V4oSE/RfFYp6HkTI4Xj4+joxv0pehEKSHEXxaTkyUBXYumy/wGHuUk w+85/zU9CZacZ+UUKGH0KQmjCaEEdrMjeaOknej1DVbVAe7fyN/yD1QBcnnSlOk3f9Zl /J9nI5uxE9smJyFY1lfHGxmICYr6YbVCMAp1StY9VQv/m4eOExFo4mLqmZmMeZKVDHH6 okbKdkoEoJVbymYtaZFKhKLGW4Dz+6lF/ozTMb1i+oPjyGi94CpJP8EZlHHizuvbxPcd NI5Q== X-Gm-Message-State: AOAM533TM2fqBy6qbyGzPVroE0WVPq/kkeJQIRtyPLdtM+yqjnUqaxtA wdy8UIn/o1er9vK16yv9tGjyZAPGCWI= X-Google-Smtp-Source: ABdhPJxLfMxzvcB3LVieLojY/5pW/O4MF/V6jGd+IyaAiYlZYAhjiZJGKkLfvD6KAHuiWnjMFqVmIQ== X-Received: by 2002:ac8:5c54:: with SMTP id j20mr15793563qtj.121.1640630582721; Mon, 27 Dec 2021 10:43:02 -0800 (PST) Received: from mavoffice.ixsystems.com ([38.32.73.2]) by smtp.gmail.com with ESMTPSA id d17sm13584675qtx.96.2021.12.27.10.43.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 Dec 2021 10:43:02 -0800 (PST) Subject: Re: My -CURRENT crashes.... To: Gleb Smirnoff , Larry Rosenman Cc: current@freebsd.org References: <286c830efc0e12e3e7a7e9b2ede31c28@lerctr.org> From: Alexander Motin Message-ID: <45ee5689-b24c-51b5-d7b7-33fd73ee7dce@FreeBSD.org> Date: Mon, 27 Dec 2021 13:43:01 -0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4JN6473Spbz3qt0 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On 27.12.2021 12:31, Gleb Smirnoff wrote: > On Fri, Dec 17, 2021 at 01:27:11PM -0600, Larry Rosenman wrote: > L> Can someone look at the messages I posted to -CURRENT, most recent > L> today, with random > L> Callout(?) crashes after long (>6 hour) poudriere runs? > L> > L> I have core's available. > > I asked Larry to obtain a core with INVARIANTS and now we have one. > > Sharing what I've found to brainstorm. Trap happens in LIST_REMOVE() > kern_timeout.c:488 because the entry doesn't have a prev pointer, e.g. > doesn't belong to any list. > > #6 0xffffffff807be075 in trap_pfault (frame=0xfffffe02d3393d50, usermode=false, signo=, ucode=) > at /usr/src/sys/amd64/amd64/trap.c:765 > #7 > #8 0xffffffff804e5609 in callout_process (now=now@entry=100465191785818) at /usr/src/sys/kern/kern_timeout.c:488 > #9 0xffffffff80460fc5 in handleevents (now=now@entry=100465191785818, fake=fake@entry=0) at /usr/src/sys/kern/kern_clocksource.c:213 > #10 0xffffffff80461a66 in timercb (et=0xffffffff80d47980 , arg=) at /usr/src/sys/kern/kern_clocksource.c:357 > #11 0xffffffff807e6beb in lapic_handle_timer (frame=0xfffffe02d3393f40) at /usr/src/sys/x86/x86/local_apic.c:1364 > > (kgdb) p *tmp > $13 = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, > c_precision = 0, c_arg = 0x0, c_func = 0x0, c_lock = 0xfffff8030521e670, c_flags = 0, c_iflags = 0, c_cpu = 0} > > Useful here is the c_lock, which points into "process lock" lockobject. > > This allows us to deduct that the callout belongs to proc subsystem and > we can retrieve the proc it points to: c_lock - 0x128 = 0xfffff8030521e548 > It is ccache in PRS_NORMAL state. And the "tmp" in our stack frame is its > p_itcallout. > > So there is something that would zero out most of the p_itcallout while > it is scheduled? So carefully zero it, but keep the lock pointer... The only way that comes to mind is callout_init_mtx() in do_fork() if we assume the process has completed and the struct proc was reused. I guess if we could somehow leak scheduled callout in exit1(). May be we could add some more assertions to try catch callout still being active there. -- Alexander Motin