From nobody Wed Sep 01 06:04:07 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 3F7E617A2313 for ; Wed, 1 Sep 2021 06:04:10 +0000 (UTC) (envelope-from leres@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GztmV1H4Wz3HJx; Wed, 1 Sep 2021 06:04:10 +0000 (UTC) (envelope-from leres@freebsd.org) Received: from ice.alameda.xse.com (unknown [IPv6:2600:1700:a570:e20:f2ad:4eff:fe0b:a065]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: leres) by smtp.freebsd.org (Postfix) with ESMTPSA id 2801124501; Wed, 1 Sep 2021 06:04:08 +0000 (UTC) (envelope-from leres@freebsd.org) Subject: Re: Patched gpsd and /dev/pps0 results in "sleeping thread" kernel panic To: Warner Losh , Ian Lepore Cc: FreeBSD Hackers References: <5476ea21-9e8a-32f5-08ff-add46c02d910@freebsd.org> From: Craig Leres Message-ID: Date: Tue, 31 Aug 2021 23:04:07 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-ThisMailContainsUnwantedMimeParts: N On 8/31/21 9:35 PM, Warner Losh wrote: > Either I'm missing something (likely am), or this might fix it up, > or at least get away from the warning: > > https://reviews.freebsd.org/D31763 > > Note: I can't recall why ppbus has to be locked for this call. > This code dates from the very earliest days of locking and > so may do things simply because it seemed like a good idea > without a specific notion as to what that lock is protecting. If > so, the real fix may be to not take the lock in pps_ioctl at > all and maybe instead use a reference count (the most > often reason for 'a good idea' was to keep the device > from going away, though this is a parent lock, not a > child one so I'm less sure about that being the reason). The crash looks the same or at least very similar to the unpatched kernel. If you'd like to experiment with switching from the lock to a reference count I am able to test that too (as well as testing that it doesn't break with the ntpd's normal use of /dev/pps0). (Do you prefer comments/traces/feedback in this thread or in the review?) Thanks! Craig toc2 1 # kgdb /boot/kernel/kernel /var/crash/vmcore.2 GNU gdb (GDB) 10.2 [GDB v10.2 for FreeBSD] Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd12.2". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /boot/kernel.LBLNET/kernel.debug... Unread portion of the kernel message buffer: Sleeping thread (tid 101007, pid 1805) owns a non-sleepable lock KDB: stack backtrace of thread 101007: sched_switch() at sched_switch+0x630/frame 0xfffffe0070e3b760 mi_switch() at mi_switch+0xd4/frame 0xfffffe0070e3b790 sleepq_catch_signals() at sleepq_catch_signals+0x403/frame 0xfffffe0070e3b7e0 sleepq_timedwait_sig() at sleepq_timedwait_sig+0x14/frame 0xfffffe0070e3b820 _sleep() at _sleep+0x1b3/frame 0xfffffe0070e3b8a0 pps_ioctl() at pps_ioctl+0x298/frame 0xfffffe0070e3b8f0 ppsioctl() at ppsioctl+0x48/frame 0xfffffe0070e3b920 devfs_ioctl() at devfs_ioctl+0xb0/frame 0xfffffe0070e3b970 VOP_IOCTL_APV() at VOP_IOCTL_APV+0x7b/frame 0xfffffe0070e3b9a0 vn_ioctl() at vn_ioctl+0x16a/frame 0xfffffe0070e3bab0 devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe0070e3bad0 kern_ioctl() at kern_ioctl+0x2b7/frame 0xfffffe0070e3bb30 sys_ioctl() at sys_ioctl+0xfa/frame 0xfffffe0070e3bc00 amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe0070e3bd30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0070e3bd30 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8004c899a, rsp = 0x7fffdfdfc6a8, rbp = 0x7fffdfdfc730 --- panic: sleeping thread cpuid = 8 time = 1630475518 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe005ab73ab0 vpanic() at vpanic+0x17b/frame 0xfffffe005ab73b00 panic() at panic+0x43/frame 0xfffffe005ab73b60 propagate_priority() at propagate_priority+0x282/frame 0xfffffe005ab73b90 turnstile_wait() at turnstile_wait+0x30c/frame 0xfffffe005ab73be0 __mtx_lock_sleep() at __mtx_lock_sleep+0x199/frame 0xfffffe005ab73c70 ppcintr() at ppcintr+0x2a0/frame 0xfffffe005ab73c90 ithread_loop() at ithread_loop+0x23c/frame 0xfffffe005ab73cf0 fork_exit() at fork_exit+0x7e/frame 0xfffffe005ab73d30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe005ab73d30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 2m39s Dumping 593 out of 12240 MB:..3%..11%..22%..33%..41%..52%..63%..71%..81%..92% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 /usr/src/sys/amd64/include/pcpu_aux.h: No such file or directory. (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=1) at ../../../kern/kern_shutdown.c:371 #2 0xffffffff80b83b2a in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:451 #3 0xffffffff80b83f83 in vpanic (fmt=, ap=) at ../../../kern/kern_shutdown.c:880 #4 0xffffffff80b83da3 in panic (fmt=) at ../../../kern/kern_shutdown.c:807 #5 0xffffffff80be71a2 in propagate_priority (td=0xfffff801c4418000) at ../../../kern/subr_turnstile.c:228 #6 0xffffffff80be7d6c in turnstile_wait (ts=0xfffff800039bae40, owner=, queue=0) at ../../../kern/subr_turnstile.c:785 #7 0xffffffff80b62cf9 in __mtx_lock_sleep (c=0xfffff80003932ad0, v=) at ../../../kern/kern_mutex.c:654 #8 0xffffffff8086fd10 in ppcintr (arg=0xfffff80003932a00) at ../../../dev/ppc/ppc.c:1546 #9 0xffffffff80b463cc in intr_event_execute_handlers (p=, ie=0xfffff800030d9d00) at ../../../kern/kern_intr.c:1143 #10 ithread_execute_handlers (p=, ie=0xfffff800030d9d00) at ../../../kern/kern_intr.c:1156 #11 ithread_loop (arg=0xfffff800039aea00) at ../../../kern/kern_intr.c:1236 #12 0xffffffff80b42e6e in fork_exit ( callout=0xffffffff80b46190 , arg=0xfffff800039aea00, frame=0xfffffe005ab73d40) at ../../../kern/kern_fork.c:1080 #13 (kgdb)