From nobody Tue Jun 08 09:04:33 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 3A9907E9FAC for ; Tue, 8 Jun 2021 09:04:37 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Fzknx0lLWz3j95; Tue, 8 Jun 2021 09:04:36 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-lf1-x135.google.com with SMTP id i10so30943581lfj.2; Tue, 08 Jun 2021 02:04:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=wBPpUenQK0qX1NlSHOpVvU7Otoi14yIN6YTKYPpms/U=; b=DdbQOQ98AdgQqVfW/Hbta/uZxlkCMAM5FHqJr9b7KkhEo80qDWGSOoURfM3n+BfISj g1cCHIryRzM7eZqpV8emVkGDbvGlgO7HJBPPkeeluqphW3Sv5StigzIjIS4WzxcsPkEe 0XpI7yrEV334NtdQBJb78BgnzQLSbvRgbfkJ2QKBTyo4bFM8ID0xYGF6PSNpSuPkcfuP g5JuQ8d0+5aTEfBQoWnJ995e4WdzsF1NNRvEVzrZ80vjqyMFDsfKYKUbZcqw7oH5TxAr M5CwFmEUll4IJL38RU+t/zECqcC+Vv9V5Y7nYDHuW9IYcJSMLfczu610IyoKxMO/Sryo dpiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=wBPpUenQK0qX1NlSHOpVvU7Otoi14yIN6YTKYPpms/U=; b=Y4P0kXexvEsdadPU+XyRN5RDz6jFJmJ9fOxlRc8Ibf2F/sScRZXFWt8rAxpAhmHykt 8JzwLokdHO47im2AzHdG9DYVnOlFmxLT/1cJ3hvdPT0VXYmsVk2d7/omspsU4OuLqiEA 5GQ4XqS3Edcfs4xLRgsUe/jPt571KfSdaB5G1pVjvjg+ugOYezTOHegsxcsIwYoS9/Pn 0AjDgh4TG8AvWs9ELv77wxNhJbgQXlwNetcOtjQJyX3O0pVuEjiyA+fHfzoXbBQWBnyo otXK6nouRdoN0G/vP+Hezc1b5PRK1P7GPCIWyjocrd988/2o4KnSfygNRC7ezsL2bx9t 60PA== X-Gm-Message-State: AOAM5327JhCjWNYrqRKc9kF8elstweH48iiMliSzFvdiVozlX2hgFYxq +oiwm2fTHFw/O1E+wtPyduwM9OrIJuURRp0kc58= X-Google-Smtp-Source: ABdhPJyB+3PpOQLFy30zcG7GlEnVmMGGTPVzUWpvq0Mk4WkZoclyzgH07nxKhlalfpAJUDggyu+sw8Mwqtsyu2Q3quA= X-Received: by 2002:a05:6512:b0f:: with SMTP id w15mr14653925lfu.118.1623143074427; Tue, 08 Jun 2021 02:04:34 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Received: by 2002:a05:651c:2006:0:0:0:0 with HTTP; Tue, 8 Jun 2021 02:04:33 -0700 (PDT) In-Reply-To: <20210608084646.6a7e1bc7@ernst.home> References: <20210607090109.08ecb130@ernst.home> <20210608084646.6a7e1bc7@ernst.home> From: Mateusz Guzik Date: Tue, 8 Jun 2021 11:04:33 +0200 Message-ID: Subject: Re: kernel panic while copying files To: gljennjohn@gmail.com Cc: Mark Johnston , freebsd-current@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4Fzknx0lLWz3j95 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N Given how easy it is to reproduce perhaps you can spend a little bit of time narrowing it down to a specific commit. You can do it with git-bisect. On 6/8/21, Gary Jennejohn wrote: > On Mon, 7 Jun 2021 16:54:11 -0400 > Mark Johnston wrote: > >> On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote: >> > I've seen this panic three times in the last two days: >> > >> > [first panic] >> > Unread portion of the kernel message buffer: >> > >> > >> > Fatal trap 12: page fault while in kernel mode >> > cpuid = 3; apic id = 03 >> > fault virtual address = 0x801118000 >> > fault code = supervisor write data, page not present >> > instruction pointer = 0x20:0xffffffff808d2212 >> > stack pointer = 0x28:0xfffffe00dbc8c760 >> > frame pointer = 0x28:0xfffffe00dbc8c7a0 >> > code segment = base 0x0, limit 0xfffff, type 0x1b >> > = DPL 0, pres 1, long 1, def32 0, gran 1 >> > processor eflags = interrupt enabled, resume, IOPL = 0 >> > current process = 28 (dom0) >> > trap number = 12 >> > panic: page fault >> > cpuid = 3 >> > time = 1622963058 >> > KDB: stack backtrace: >> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> > 0xfffffe00dbc8c410 >> > vpanic() at vpanic+0x181/frame 0xfffffe00dbc8c460 >> > panic() at panic+0x43/frame 0xfffffe00dbc8c4c0 >> > trap_fatal() at trap_fatal+0x387/frame 0xfffffe00dbc8c520 >> > trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00dbc8c580 >> > trap() at trap+0x253/frame 0xfffffe00dbc8c690 >> > calltrap() at calltrap+0x8/frame 0xfffffe00dbc8c690 >> > --- trap 0xc, rip = 0xffffffff808d2212, rsp = 0xfffffe00dbc8c760, rbp = >> > 0xfffffe00dbc8c7a0 --- >> > zone_release() at zone_release+0x1f2/frame 0xfffffe00dbc8c7a0 >> > bucket_drain() at bucket_drain+0xda/frame 0xfffffe00dbc8c7d0 >> > bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame >> > 0xfffffe00dbc8c830 >> > zone_reclaim() at zone_reclaim+0x162/frame 0xfffffe00dbc8c880 >> > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame >> > 0xfffffe00dbc8c8b0 >> > vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfffffe00dbc8cb70 >> > vm_pageout() at vm_pageout+0x21e/frame 0xfffffe00dbc8cbb0 >> > fork_exit() at fork_exit+0x7e/frame 0xfffffe00dbc8cbf0 >> > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00dbc8cbf0 >> > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- >> > KDB: enter: panic >> > >> > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 >> > 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct >> > pcpu, >> > pc_curthread))); >> > >> > One difference was that in the second and third panics the fault >> > virtual >> > address was 0x0. But the backtrace was the same. >> > >> > Relevant info from the info.x files: >> > Architecture: amd64 >> > Architecture Version: 2 >> > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat >> > Jun >> > 5 09:58:55 CEST 2021 >> > >> > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz >> > K8-class CPU) >> > Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 >> > Stepping=1 >> > AMD Features=0x2e500800 >> > AMD >> > Features2=0x35c233ff >> > AMD Extended Feature Extensions ID >> > EBX=0x1007 >> > >> > I have 16GiB of memory in the box. >> > >> > The panic occurred while copying files from an internal SATA SSD to a >> > SATA 8TB disk in an external USB3 docking station. The panic seems to >> > occur quite quickly, after only a few files have been copied. >> > >> > swap is on a different internal disk. >> > >> > I can poke around in the crash dumps with kgdb if anyone wants more >> > information. >> >> Are you running with invariants configured in the kernel? If not, >> please try to reproduce this in a kernel with >> >> options INVARIANT_SUPPORT >> options INVARIANTS >> >> configured. >> >> A stack trace with line numbers would also be helpful. > > Thanks for the hint. After enabling INVARIANTS the kernel panics as > soon I turn on the external USB3 disk. No user disk access required. > > Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun > 8 09:34:32 CEST 2021 > > Here the kgdb backtrace: > > Unread portion of the kernel message buffer: > panic: Duplicate free of 0xfffff800356b9000 from zone > 0xfffffe00dcbdd800(da_ccb) slab 0xfffff800356b9fd8(0) > cpuid = 8 > time = 1623140519 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe00c5f398c0 > vpanic() at vpanic+0x181/frame 0xfffffe00c5f39910 > panic() at panic+0x43/frame 0xfffffe00c5f39970 > uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfffffe00c5f399b0 > uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfffffe00c5f39a00 > camperiphdone() at camperiphdone+0x1b7/frame 0xfffffe00c5f39b20 > xpt_done_process() at xpt_done_process+0x3dd/frame 0xfffffe00c5f39b60 > xpt_done_td() at xpt_done_td+0xf5/frame 0xfffffe00c5f39bb0 > fork_exit() at fork_exit+0x80/frame 0xfffffe00c5f39bf0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c5f39bf0 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct > pcpu, > (kgdb) bt > #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > #1 doadump (textdump=textdump@entry=0) > at /usr/src/sys/kern/kern_shutdown.c:399 > #2 0xffffffff8040c39a in db_dump (dummy=, > dummy2=, dummy3=, dummy4=) > at /usr/src/sys/ddb/db_command.c:575 > #3 0xffffffff8040c192 in db_command (last_cmdp=, > cmd_table=, dopager=dopager@entry=1) > at /usr/src/sys/ddb/db_command.c:482 > #4 0xffffffff8040beed in db_command_loop () > at /usr/src/sys/ddb/db_command.c:535 > #5 0xffffffff8040f616 in db_trap (type=, code= out>) > at /usr/src/sys/ddb/db_main.c:270 > #6 0xffffffff8066b1c4 in kdb_trap (type=type@entry=3, code=code@entry=0, > tf=, tf@entry=0xfffffe00c5f397f0) > at /usr/src/sys/kern/subr_kdb.c:727 > #7 0xffffffff809a4e96 in trap (frame=0xfffffe00c5f397f0) > at /usr/src/sys/amd64/amd64/trap.c:604 > #8 > #9 kdb_enter (why=0xffffffff80a61a23 "panic", msg=) > at /usr/src/sys/kern/subr_kdb.c:506 > #10 0xffffffff806207a2 in vpanic (fmt=, ap=, > ap@entry=0xfffffe00c5f39950) at /usr/src/sys/kern/kern_shutdown.c:907 > #11 0xffffffff80620533 in panic ( > fmt=0xffffffff80d635c8 ".\024\244\200\377\377\377\377") > at /usr/src/sys/kern/kern_shutdown.c:843 > #12 0xffffffff808e12b1 in uma_dbg_free (zone=0xfffffe00dcbdd800, > slab=0xfffff800356b9fd8, item=0xfffff800356b9000) > at /usr/src/sys/vm/uma_core.c:5664 > #13 0xffffffff808d9de7 in item_dtor (zone=0xfffffe00dcbdd800, > item=0xfffff800356b9000, size=544, udata=0x0, skip=SKIP_NONE) > at /usr/src/sys/vm/uma_core.c:3418 > #14 uma_zfree_arg (zone=0xfffffe00dcbdd800, item=0xfffff800356b9000, > udata=udata@entry=0x0) at /usr/src/sys/vm/uma_core.c:4374 > #15 0xffffffff802da503 in uma_zfree (zone=0xffffffff80d635c8 , > item=0x200) at /usr/src/sys/vm/uma.h:404 > #16 0xffffffff802d9117 in camperiphdone (periph=0xfffff800061e2c00, > done_ccb=0xfffff800355d6cc0) at /usr/src/sys/cam/cam_periph.c:1427 > #17 0xffffffff802dfebd in xpt_done_process (ccb_h=0xfffff800355d6cc0) > at /usr/src/sys/cam/cam_xpt.c:5491 > #18 0xffffffff802e1ec5 in xpt_done_td ( > arg=arg@entry=0xffffffff80d33d80 ) > at /usr/src/sys/cam/cam_xpt.c:5546 > #19 0xffffffff805dad80 in fork_exit (callout=0xffffffff802e1dd0 > , > arg=0xffffffff80d33d80 , frame=0xfffffe00c5f39c00) > at /usr/src/sys/kern/kern_fork.c:1083 > #20 > > Apparently caused by recent changes to CAM. > > Let me know if you want more information. > > -- > Gary Jennejohn > > -- Mateusz Guzik