From nobody Thu Sep 22 19:00:53 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4MYPkj2w9qz4d3gy for ; Thu, 22 Sep 2022 19:01:01 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-il1-x12d.google.com (mail-il1-x12d.google.com [IPv6:2607:f8b0:4864:20::12d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4MYPkh4cGqz3fvv for ; Thu, 22 Sep 2022 19:01:00 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-il1-x12d.google.com with SMTP id i16so2373034ilq.0 for ; Thu, 22 Sep 2022 12:01:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date; bh=Vw7H/BeVI2HhGWl4qzJNU3Yrcn/ZnLVTSRCPwqoIFGI=; b=fwCYdK+O/rsolcGpyCl3X+dMeP/lxwv4/VrAdPtm6mLJpVLbduYnb7xXeFfp0tAbdF 8ANv9WCdKFuzCAjnXDFvAWizdgfLuUR4P20IKsWBLWVlIE3K/eHTdQGTpvlbNDBnbfmv j43/tlfGXhSiPUSGqMy4lcwGMis8MZRLQlLYgQnvRSILQXw2PbN0Yd+w0E0deXatG5Wq DC/E/4v+CPxT6dzzGLJKUQT5DDG+wucGrLWLDy28knqMEjWuqOPXOr4wCRkSCO58q+1R jzmHXrm10fa7a/QdFIXNIEZuThUsopQNVuBYiUJBbHiUvAum2t9ikX5Yrgf1QN2kKQFM skMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date; bh=Vw7H/BeVI2HhGWl4qzJNU3Yrcn/ZnLVTSRCPwqoIFGI=; b=b5N1R5bGiXchpGte9Qgishlc01js48RU1mX9rC/fi+rWDJC3gHZo89U8JxHsgowKzZ GbQrtT2sqj5t5yBL/VKLJjS9XeCSdWgB+3jSk7IC2vtohEfJdT/Ga7XUVF0bM2ySXTbp eCxvLQknVMYl1q9ymZFX2Fz1Y6/xZodV7GBon+HLmlm8lnLAsh1nOiHVj1vJ8QLgBpYn b2VpwxOPIOA5nj2DWRRuNWlTTP73OyrRXenDO2BoCxzSlpimyJciD0qn56FkXw2AKXH1 8jSm7stzFNKYkUh7eWD4ERQWb2YZHyH/JK5WzhZleyGOKflceE0V+yT1g32lI5F7NHep epnw== X-Gm-Message-State: ACrzQf1waVXyIaLWqpwDbJpEqs3L6rwpkimVVOYwXWc3yavynqQ4fyKs suWH4kt4WgTeFXIJHIQ6EAFZttxnqqM= X-Google-Smtp-Source: AMsMyM7R7aBncpRIewcYIaz0n/acdxBil/HYwqmZ/H6CQYzGRK37SLdzob471mlgZI2zdq18zPdktA== X-Received: by 2002:a05:6e02:1e06:b0:2f6:2666:e8ca with SMTP id g6-20020a056e021e0600b002f62666e8camr2374952ila.173.1663873259693; Thu, 22 Sep 2022 12:00:59 -0700 (PDT) Received: from nuc (192-0-220-237.cpe.teksavvy.com. [192.0.220.237]) by smtp.gmail.com with ESMTPSA id f4-20020a05660215c400b00688faad4741sm2596178iow.24.2022.09.22.12.00.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Sep 2022 12:00:57 -0700 (PDT) Date: Thu, 22 Sep 2022 15:00:53 -0400 From: Mark Johnston To: Steve Kargl Cc: freebsd-current@freebsd.org Subject: Re: A panic a day Message-ID: References: List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4MYPkh4cGqz3fvv X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=fwCYdK+O; dmarc=none; spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::12d as permitted sender) smtp.mailfrom=markjdb@gmail.com X-Spamd-Result: default: False [-2.70 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.997]; MID_RHS_NOT_FQDN(0.50)[]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; MIME_GOOD(-0.10)[text/plain]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; FROM_HAS_DN(0.00)[]; DMARC_NA(0.00)[freebsd.org]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::12d:from]; RCVD_VIA_SMTP_AUTH(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; DKIM_TRACE(0.00)[gmail.com:+]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; ARC_NA(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FREEMAIL_ENVFROM(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_TLS_LAST(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On Thu, Sep 22, 2022 at 11:31:40AM -0700, Steve Kargl wrote: > All, > > I updated my kernel/world/all ports on Sept 19 2022. > Since then, I have had daily panics and hard lock-up > (no panic, keyboard, mouse, network, ...). The one > panic I did witness sent text scolling off the screen. > There is no dump, or at least, I haven't figured out > a way to get a dump. > > Using ports/graphics/tesseract and then hand editing > the OCR result, the last visible portions is > > > panic() at panic+0x43/frame 0xfffffe00daf65550 > __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0xc6/frame 0xfffffe00daf655e0 > sched_add() at sched_add+0x98/frame 0xfffffe00daf656a0 > setrunnable() at setrunnable+0x73/frame 0xfffffe00daf656d0 > wakeup_any() at wakeup_any+0x1f/frame 0xfffffe00daf656f0 > taskqueue_enqueue_locked() at taskqueue_enqueue_locked+0x13e/frame 0xfffffe00daf65720 > taskqueue_enqueue_timeout_sbt() at taskqueue_enqueue_timeout_sbt+0xe5/frame 0xfffffe00daf65770 > resettodr() at resettodr+0x7a/frame 0xfffffe00daf657b0 > kern_reboot() at kern_reboot+0x2ae/frame 0xfffffe00daf657f0 > vpanic() at vpanic+0x1be/frame 0xfffffe00daf65840 > panic() at panic+0x43/frame 0xfffffe00daf658a0 > __mtx_lock_spin_flags() at __mix_lock_spin_flags+0xc6/frame 0xfffffe00daf65ab0 > sched_add() at sched_add+0x98/frame 0xfffffe00daf65990 > setrunnable() at setrunnable+0x73/frame 0xfffffe008daf659c0 > wakeup_any() at wakeup_any+0x1f/frame 0xfffffe00daf659e0 > taskqueue_enqueue_locked() at taskqueue_enqueue_locked+0x13e/frame 0xfffffe00daf65a11 > drm_crtc_helper_set_config() at drm_crtc_helper_set_config+0x971/frame 0xfffffe00daf65abl > radeon_crtc_set_config() at radeon_crtc_set_config+0x22/frame 0xfffffe00daf65ad0 > __drm_mode_set_config_internal() at __drm_mode_set_config_internal+0xdd/frame 0xfffffe00daf65b10 > drm_client_modeset_commit_locked() at drm_client_modeset_commit_locked+0x160/frame 0xfffffe00daf65b50 > drm_client_modeset_commit() at drm_client_modeset_commit+0x21/frame 0xfffffe00daf65b70 > drm_fb_helper_restore_fbdev_mode_unlocked() at drm_fb_helper_restore_fbdev_mode_unlocked+0x81/frame > vt_kms_postswitch() at vt_kms_postswitch+0x166/frame 0xfffffe00daf65bd0 > vt_window_switch() at vt_window_switch+0x119/frame 0xfffffe00daf65c1d > vtterm_cngrab() at vtterm_cngrab+0x4f/frame 0xfffffe00daf65c30 > cngrab() at cngrab+0x26/frame 0xfffffe00daf65ca0 > vpanic() at vpanic+0xf0/frame 0xfffffe00daf65ca0 > panic() at panic+0x43/frame 0xfffffe00daf65d00 > __mtx_assert() at __mtx_assert+0x9d/frame 0xfffffe00daf65d10 > ast_sched_locked() at ast_sched_locked+0x29/frame 0xfffffe00daf65d30 > sched_add() at sched_add+0x4c5/frame 0xfffffe00daf65df0 > sched_switch() at sched_switch+0x9f/frame 0xfffffe00daf65e20 > mi_switch() at mi_switch+0x14b/frame 0xfffffe00daf65e40 > sched_bind() at sched_bind+0x73/frame 0xfffffe00daf65e60 > pcpu_cache_drain_safe() at pcpu_cache_drain_safe+0x25a/frame 0xfffffe00daf65e90 > uma_reclaim_domain() at uma_reclain_domain+0x279/frame Buf ffffe00dafohech > uma_reclaim_worker() at uma_reclaim_worker+0x35/frame 0xfffffe00daf65ef0 > fork_exit() at fork_exit+0x80/frame 0xfffffe00daf65f30 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00daf65f30 > --- trap 0, rip = 0, rop = 0, rbp = 0 --- It looks like you use the 4BSD scheduler? I think there's a bug in kick_other_cpu() in that it doesn't make sure that the remote CPU's curthread lock is held when modifying thread state. Because 4BSD has a global scheduler lock, this is often true in practice, but doesn't have to be. I think this untested patch will address the panics. The bug was there for a long time but some recent restructuring added an assertion which caught it. diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c index 9d48aa746f6d..484864b66c1c 100644 --- a/sys/kern/sched_4bsd.c +++ b/sys/kern/sched_4bsd.c @@ -1282,9 +1282,10 @@ kick_other_cpu(int pri, int cpuid) } #endif /* defined(IPI_PREEMPTION) && defined(PREEMPTION) */ - ast_sched_locked(pcpu->pc_curthread, TDA_SCHED); - ipi_cpu(cpuid, IPI_AST); - return; + if (pcpu->pc_curthread->td_lock == &sched_lock) { + ast_sched_locked(pcpu->pc_curthread, TDA_SCHED); + ipi_cpu(cpuid, IPI_AST); + } } #endif /* SMP */ @@ -1397,7 +1398,7 @@ sched_add(struct thread *td, int flags) cpuid = PCPU_GET(cpuid); if (single_cpu && cpu != cpuid) { - kick_other_cpu(td->td_priority, cpu); + kick_other_cpu(td->td_priority, cpu); } else { if (!single_cpu) { tidlemsk = idle_cpus_mask;