From nobody Thu Jul 01 03:58:00 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 1F8C611CF5C5 for ; Thu, 1 Jul 2021 03:58:06 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GFkvd6vVNz3Gbc for ; Thu, 1 Jul 2021 03:58:05 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: by mail-ed1-x52a.google.com with SMTP id m1so6262453edq.8 for ; Wed, 30 Jun 2021 20:58:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=+Seu5IEvnFI9MsUcsForuNyBeIVr+eC7P83gdZuBdyI=; b=VR+5GiP/Dp7llMKnVP7fNUJ2vBy5NDodD1NlxZpOPKZYp1wF2N6oxzC8Rklu5Zi2om LcQTQa5R/cmkOupNRVTifSpDdiCVHsUJlVjqxyeYjEWIqcbxIhxjSRgNED+hTkePRdYk vWewqu8eHB4+2DZCf20yCXzihyZUtxcGs6dYCmkLd1pV9Vfvl98o/G5VsIqEq2PqIDRh kQLU4TKWOs56EzzVc3zybf9RCq15CXHIZESmV+HYmEKBIgNIUQIvpusNpb2nUN5pBfvB UCLo24Wl/Tgtttblb9UPK8Oxne/edo5+F+2IPQ8O4/fGSL1JaTIyd2abQgzEmM5/46aZ zoQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=+Seu5IEvnFI9MsUcsForuNyBeIVr+eC7P83gdZuBdyI=; b=XWioVFbOdwzS1urUGlpB+dT5vQ/tpY+aeAcaPhtoVAcJe4xiVJCbna4u5uGXc+yRk8 TGAOeo8y0X5QQRnmmmz/8l99MiAVOeIAkAz/jJGrNyQQNvlsr0lsysJ9g0s0IWQPBjOx uYgdswTDncJsLb0Aj3tAirAxT7zGL+qUZKFBCcQ4vhWcEoXHIJpMBcVXa7f5V4unlibE /mS/hsy6VnL3ESqIYLt/lx7qFvjZmlMb2r3Hv4THAN8PKuKAtHJRYmrpGr2urCB9eRwE z/i8/MoOu5DGkXFFbjB1UDxv3CuTs6DGeQNyNVDdUPoaTY98f4zu7sH2SseUAeUGvkwe OdTA== X-Gm-Message-State: AOAM533J74zGkcGFfyUV2FGhptdDHL6GAOvdApBAYlxKBIdnuZqGTYvK l0luwO7LemgLiK89Emgr3/Cp1NxsjEg= X-Google-Smtp-Source: ABdhPJz9jTZIGNSakjMiFVGRR3k99SNdKjGcOwiNMF3qL+YQTE7juOuw5VagJJj/NwhLRrnkUmGIxw== X-Received: by 2002:aa7:cf91:: with SMTP id z17mr7775382edx.112.1625111884539; Wed, 30 Jun 2021 20:58:04 -0700 (PDT) Received: from ernst.home (pd9e2360f.dip0.t-ipconnect.de. [217.226.54.15]) by smtp.gmail.com with ESMTPSA id i6sm10174979ejr.68.2021.06.30.20.58.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Jun 2021 20:58:03 -0700 (PDT) Date: Thu, 1 Jul 2021 05:58:00 +0200 From: Gary Jennejohn To: Warner Losh Cc: FreeBSD Current Subject: Re: panic: Unaligned free (was: kernel panic while copying files) Message-ID: <20210701035800.410d2376@ernst.home> In-Reply-To: References: <20210607090109.08ecb130@ernst.home> <20210608084646.6a7e1bc7@ernst.home> <20210608155405.5cf0e200@ernst.home> <20210610095041.38d7597c@ernst.home> <20210629094201.77ef5f22@ernst.home> <20210630125703.2b5544e7@ernst.home> Reply-To: gljennjohn@gmail.com X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; amd64-portbld-freebsd14.0) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4GFkvd6vVNz3Gbc X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Wed, 30 Jun 2021 10:35:14 -0600 Warner Losh wrote: > On Wed, Jun 30, 2021 at 6:58 AM Gary Jennejohn wrote: > > > On Wed, 30 Jun 2021 06:02:59 +0100 > > Graham Perrin wrote: > > > > > On 29/06/2021 10:42, Gary Jennejohn wrote: > > > > ___ panic is now the result of an unaligned free. > > > > > > > > panic: Unaligned free of 0xfffff800259e2800 from zone > > > > 0xfffffe00dc9d2000(da_ccb) slab 0xfffff800259e2fd8(3) > > > > > > > > I have the crash dump and a debug kernel in case anyone wants more > > info. > > > Can you post the backtrace etc. here? Thanks > > > > > > > Sure. As can be seen from the uma zone being da_ccb, the panic > > resulted from setting kern.cam.da.enable_uma_ccbs=1. > > > > Unread portion of the kernel message buffer: > > panic: Unaligned free of 0xfffff800259e2800 from zone > > 0xfffffe00dc9d2000(da_ccb) slab 0xfffff800259e2fd8(3) > > cpuid = 2 > > time = 1624958650 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame > > 0xfffffe00c62687a0 > > kdb_backtrace() at kdb_backtrace+0x46/frame 0xfffffe00c6268850 > > vpanic() at vpanic+0x227/frame 0xfffffe00c62688f0 > > panic() at panic+0x4e/frame 0xfffffe00c6268950 > > uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfffffe00c62689a0 > > item_dtor() at item_dtor+0x7c/frame 0xfffffe00c62689e0 > > uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfffffe00c6268a50 > > uma_zfree() at uma_zfree+0x23/frame 0xfffffe00c6268a70 > > xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfffffe00c6268a90 > > camperiphdone() at camperiphdone+0x211/frame 0xfffffe00c6268ae0 > > xpt_done_process() at xpt_done_process+0x550/frame 0xfffffe00c6268b40 > > xpt_done_td() at xpt_done_td+0x1c0/frame 0xfffffe00c6268b80 > > fork_exit() at fork_exit+0x117/frame 0xfffffe00c6268bf0 > > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c6268bf0 > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > > KDB: enter: panic > > > > doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 > > 399 dumptid = curthread->td_tid; > > (kgdb) bt > > #0 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 > > #1 0xffffffff804d5dd7 in db_dump (dummy=-2138843371, dummy2=false, > > dummy3=-1, > > dummy4=0xfffffe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575 > > #2 0xffffffff804d5bf4 in db_command ( > > last_cmdp=0xffffffff8114ce80 , cmd_table=0x0, > > dopager=1) > > at /usr/src/sys/ddb/db_command.c:482 > > #3 0xffffffff804d583c in db_command_loop () > > at /usr/src/sys/ddb/db_command.c:535 > > #4 0xffffffff804da27c in db_trap (type=3, code=0) > > at /usr/src/sys/ddb/db_main.c:270 > > #5 0xffffffff8083df9d in kdb_trap (type=3, code=0, tf=0xfffffe00c6268770) > > at /usr/src/sys/kern/subr_kdb.c:727 > > #6 0xffffffff80d31494 in trap (frame=0xfffffe00c6268770) > > at /usr/src/sys/amd64/amd64/trap.c:604 > > #7 0xffffffff80d32628 in trap_check (frame=0xfffffe00c6268770) > > at /usr/src/sys/amd64/amd64/trap.c:664 > > #8 > > #9 breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66 > > #10 0xffffffff8083d3d0 in kdb_enter (why=0xffffffff80e0355b "panic", > > msg=0xffffffff80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505 > > #11 0xffffffff807d1725 in vpanic ( > > fmt=0xffffffff80dbca46 "Unaligned free of %p from zone %p(%s) slab > > %p(%d)", ap=0xfffffe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906 > > #12 0xffffffff807d120e in panic ( > > fmt=0xffffffff80dbca46 "Unaligned free of %p from zone %p(%s) slab > > %p(%d)") > > at /usr/src/sys/kern/kern_shutdown.c:843 > > #13 0xffffffff80c16a8c in uma_dbg_free (zone=0xfffffe00dc9d2000, > > slab=0xfffff800259e2fd8, item=0xfffff800259e2800) > > at /usr/src/sys/vm/uma_core.c:5659 > > #14 0xffffffff80c0c5dc in item_dtor (zone=0xfffffe00dc9d2000, > > item=0xfffff800259e2800, size=544, udata=0x0, skip=SKIP_NONE) > > at /usr/src/sys/vm/uma_core.c:3418 > > #15 0xffffffff80c0ba60 in uma_zfree_arg (zone=0xfffffe00dc9d2000, > > item=0xfffff800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 > > #16 0xffffffff802e45d3 in uma_zfree (zone=0xfffffe00dc9d2000, > > item=0xfffff800259e2800) at /usr/src/sys/vm/uma.h:404 > > #17 0xffffffff802dc3c3 in xpt_free_ccb (free_ccb=0xfffff800259e2800) > > at /usr/src/sys/cam/cam_xpt.c:4676 > > #18 0xffffffff802dacf1 in camperiphdone (periph=0xfffff80025329b00, > > done_ccb=0xfffff80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427 > > #19 0xffffffff802e4520 in xpt_done_process (ccb_h=0xfffff80025a24cc0) > > at /usr/src/sys/cam/cam_xpt.c:5493 > > #20 0xffffffff802e68e0 in xpt_done_td (arg=0xffffffff81143700 ) > > at /usr/src/sys/cam/cam_xpt.c:5548 > > #21 0xffffffff807673c7 in fork_exit (callout=0xffffffff802e6720 > > , > > arg=0xffffffff81143700 , frame=0xfffffe00c6268c00) > > at /usr/src/sys/kern/kern_fork.c:1083 > > #22 > > > > [kgdb stuff removed] > > > > (kgdb) down > > #15 0xffffffff80c0ba60 in uma_zfree_arg (zone=0xfffffe00dc9d2000, > > item=0xfffff800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 > > 4374 item_dtor(zone, item, cache_uz_size(cache), udata, > > SKIP_NONE); > > (kgdb) down > > #14 0xffffffff80c0c5dc in item_dtor (zone=0xfffffe00dc9d2000, > > item=0xfffff800259e2800, size=544, udata=0x0, skip=SKIP_NONE) > > at /usr/src/sys/vm/uma_core.c:3418 > > 3418 uma_dbg_free(zone, NULL, item); > > (kgdb) p/x skipdbg > > $26 = 0x0 > > (kgdb) p/x zone->uz_flags > > $27 = 0x41000000 (UMA_ZFLAG_TRASH|UMA_ZFLAG_CTORDTOR) > > (kgdb) down > > #13 0xffffffff80c16a8c in uma_dbg_free (zone=0xfffffe00dc9d2000, > > slab=0xfffff800259e2fd8, item=0xfffff800259e2800) > > at /usr/src/sys/vm/uma_core.c:5659 > > 5659 panic("Unaligned free of %p from zone %p(%s) slab > > %p(%d)", > > > > Note that item is the same as saved_ccb. > > > > (kgdb) p/x *zone->uz_keg > > $28 = {uk_hash = {uh_slab_hash = 0x0, uh_hashsize = 0x0, uh_hashmask = > > 0x0}, > > uk_zones = {lh_first = 0xfffffe00dc9d2000}, uk_dr = { > > dr_policy = 0xffffffff810010e0, dr_iter = 0x0}, uk_align = 0x7, > > uk_reserve = 0x0, uk_size = 0x220, uk_rsize = 0x220, > > uk_init = 0xffffffff80c17d50, uk_fini = 0xffffffff80c17d80, > > uk_allocf = 0xffffffff80d342f0, uk_freef = 0xffffffff80d346a0, > > uk_offset = 0x0, uk_kva = 0x0, uk_pgoff = 0xfd8, uk_ppera = 0x1, > > uk_ipers = 0x7, uk_flags = 0x0, uk_name = 0xffffffff80debbac, uk_link = { > > le_next = 0xfffff80005968000, le_prev = 0xfffff80005968380}, > > uk_domain = 0xfffff80005968240} > > (kgdb) p/x *slab > > $29 = {us_link = {le_next = 0xdeadc0dedeadc0de, le_prev = > > 0xdeadc0dedeadc0de}, > > us_freecount = 0xc0de, us_flags = 0xad, us_domain = 0xde, us_free = { > > __bits = 0xfffff800259e2ff0}} > > (kgdb) p/x *0xfffff800259e2ff0 > > $30 = 0xdeadc0de > > Don't know whether this matters, but slab seems to be unitialized. > > (kgdb) p/x freei > > $31 = 0x3 > > > > In any case, saved_ccb has an address which lies outside the range > > covered by slab, i.e. freei is bigger than the number of entries in > > slab. > > > > I suspect that the only way to really figure out what's going on is to > > run the kernel in kgbd and set lots of breakpoints, > > > > What's happening is this, I think. > > (1) We send a request. > (2) It fails, so we send a start unit > BUT we do weird things to copy the CCBs around, and the request from (1) > and (2) have different allocations. Error recovery overwrites the original > request with a new request after saving it off.... > (3) start unit succeeds, we go to free one of the CCBs and it's marked > incorrectly, triggering either this panic or the prior one we saw. > > These actions were fine when there was one allocator, but now that there > are two more care must be taken, and that more care hasn't been taken yet, > so kern.cam.da.enable_uma_ccbs=1 is unsafe for now and should not be used. > kern.cam.ada.enable_uma_ccbs=1 is fine as its error recovery doesn't do > these things. > I noticed that the value of vm.debug.divisor affects what value is returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few different values. The returned value is used to set skipdbg in uma_core.c:item_dtor(). The default is vm.debug.divisor=1. vm.debug.divisor is only present when INVARIANTS is defined. kskipdbg eventually affects the value of freei. With these values: vm.debug.divisor: 0 kern.cam.da.enable_uma_ccbs: 1 I can turn on the disk and it comes up without a panic! However, I didn't try to do any large data transfers to the disk. So, it appears that at least vm.debug.divisor is a big factor in whether or not a panic happens with INVARIANTS. -- Gary Jennejohn