From nobody Mon Oct 04 04:06:44 2021 X-Original-To: fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 9849217D8FDB for ; Mon, 4 Oct 2021 04:06:58 +0000 (UTC) (envelope-from lstewart@freebsd.org) Received: from lauren.room52.net (lauren.room52.net [207.148.86.158]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HN6c1665Jz3HyM for ; Mon, 4 Oct 2021 04:06:57 +0000 (UTC) (envelope-from lstewart@freebsd.org) Received: by lauren.room52.net (Postfix) with ESMTPSA id 8183045D9F for ; Mon, 4 Oct 2021 15:06:47 +1100 (AEDT) From: Lawrence Stewart Subject: zfs: repeatable assertion failure in zap_deref_leaf on 14-CURRENT To: fs@FreeBSD.org Message-ID: <91c2e8a5-3e43-e295-b388-a21e58105bf2@freebsd.org> Date: Mon, 4 Oct 2021 15:06:44 +1100 User-Agent: Not your concern List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-AU Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4HN6c1665Jz3HyM X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [0.00 / 15.00]; local_wl_from(0.00)[freebsd.org]; ASN(0.00)[asn:20473, ipnet:207.148.80.0/21, country:US] X-ThisMailContainsUnwantedMimeParts: N Greetings all, I'm able to reliably trigger a panic with ZFS implicated in the backtrace. This is on a laptop with a very simple zpool built on top of a single GELI gpart partition on a single nvme disk. A scrub has not found anything of concern. I encounter the panic when I start Chromium and it begins restoring lots of previous windows/tabs. I can also trigger by simply running an ls or find on ~/.cache/chromium/Cache so presumably something about the contents of that directory have been corrupted in an interesting way which is upsetting ZFS. Moving the directory out of the way allows me to start Chromium without issue. I'm currently running world + kernel built a few days ago from Git main 335c4f8edb3a kernconf=GENERIC amd64 with a fully up to date package set and drm-kmod-current built from ports against my kernel. I captured a core earlier today and have it around for any spelunking that would be helpful to anyone who may be interested. Ignoring the drm-related parts of the backtrace which happens after the ZFS panic, frame 36 looks like it's the cause of the crash - namely this assertion in zap.c: ASSERT(err || ZAP_HASH_IDX(h, zap_leaf_phys(*lp)->l_hdr.lh_prefix_len) == zap_leaf_phys(*lp)->l_hdr.lh_prefix); Below I've included some output from a kgdb session against the core I captured which I hope may be a useful starting point. As noted earlier I can easily move the Cache dir out of the way so this is not a blocking issue, but I've kept the Cache dir in case anyone is interested in debugging. Cheers, Lawrence Full backtrace: (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=175153152) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff804cfece in db_fncall_generic (addr=, rv=, nargs=0, args=) at /usr/src/sys/ddb/db_command.c:610 #3 db_fncall (dummy1=, dummy2=, dummy3=, dummy4=) at /usr/src/sys/ddb/db_command.c:658 #4 0xffffffff804cfad2 in db_command (last_cmdp=, cmd_table=, dopager=dopager@entry=1) at /usr/src/sys/ddb/db_command.c:482 #5 0xffffffff804cf74d in db_command_loop () at /usr/src/sys/ddb/db_command.c:535 #6 0xffffffff804d2ec6 in db_trap (type=, code=) at /usr/src/sys/ddb/db_main.c:270 #7 0xffffffff80c75487 in kdb_trap (type=type@entry=3, code=code@entry=0, tf=tf@entry=0xfffffe01531a6f80) at /usr/src/sys/kern/subr_kdb.c:733 #8 0xffffffff810eedd3 in trap (frame=0xfffffe01531a6f80) at /usr/src/sys/amd64/amd64/trap.c:609 #9 #10 kdb_enter (why=0xffffffff812e9008 "panic", msg=) at /usr/src/sys/kern/subr_kdb.c:506 #11 0xffffffff80c277e8 in vpanic (fmt=0xffffffff811f49e6 "running but not TDS_RUNNING", ap=, ap@entry=0xfffffe01531a70f0) at /usr/src/sys/kern/kern_shutdown.c:908 #12 0xffffffff80c27573 in panic (fmt=0xffffffff81e9e1e0 "\270\\*\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:844 #13 0xffffffff80c856f7 in sleepq_switch (wchan=wchan@entry=0xfffff800063b0888, pri=pri@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:609 #14 0xffffffff80c85b2b in sleepq_timedwait (wchan=wchan@entry=0xfffff800063b0888, pri=pri@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:690 #15 0xffffffff80e9ae5f in linux_add_to_sleepqueue (wchan=0xfffff800063b0888, task=task@entry=0xfffff800063b0888, wmesg=, timeout=timeout@entry=11, state=state@entry=2) at /usr/src/sys/compat/linuxkpi/common/src/linux_schedule.c:70 #16 0xffffffff80e9acb6 in linux_wait_event_common (wqh=, wq=, timeout=11, state=2, lock=0x0) at /usr/src/sys/compat/linuxkpi/common/src/linux_schedule.c:281 #17 0xffffffff839b4960 in intel_dp_aux_xfer () from /boot/modules/i915kms.ko #18 0xffffffff839b4269 in intel_dp_aux_transfer () from /boot/modules/i915kms.ko #19 0xffffffff83a74b79 in drm_dp_dpcd_access () from /boot/modules/drm.ko #20 0xffffffff83a74c27 in drm_dp_dpcd_write () from /boot/modules/drm.ko #21 0xffffffff839acb62 in intel_dp_sink_dpms () from /boot/modules/i915kms.ko #22 0xffffffff839a8d76 in intel_ddi_post_disable () from /boot/modules/i915kms.ko #23 0xffffffff8393fe7b in haswell_crtc_disable () from /boot/modules/i915kms.ko #24 0xffffffff83955cdc in intel_old_crtc_state_disables () from /boot/modules/i915kms.ko #25 0xffffffff839543c5 in intel_atomic_commit_tail () from /boot/modules/i915kms.ko #26 0xffffffff839507da in intel_atomic_commit () from /boot/modules/i915kms.ko #27 0xffffffff83a6d378 in drm_client_modeset_commit_atomic () from /boot/modules/drm.ko #28 0xffffffff83a6d0d6 in drm_client_modeset_commit_force () from /boot/modules/drm.ko #29 0xffffffff83aae50a in drm_fb_helper_restore_fbdev_mode_unlocked () from /boot/modules/drm.ko #30 0xffffffff83aa8397 in vt_kms_postswitch () from /boot/modules/drm.ko #31 0xffffffff80a79f8f in vt_window_switch (vw=0xfffffe014eee7928, vw@entry=0xffffffff81aea0f8 ) at /usr/src/sys/dev/vt/vt_core.c:620 #32 0xffffffff80a773af in vtterm_cngrab (tm=) at /usr/src/sys/dev/vt/vt_core.c:1832 #33 0xffffffff80bbba56 in cngrab () at /usr/src/sys/kern/kern_cons.c:383 #34 0xffffffff80c27720 in vpanic (fmt=0xffffffff82835173 "%s", ap=ap@entry=0xfffffe01531a7930) at /usr/src/sys/kern/kern_shutdown.c:892 #35 0xffffffff8258361a in spl_panic (file=, func=, line=-2127879961, fmt=0xffffffff81e9e1e0 "\270\\*\201\377\377\377\377") at /usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_misc.c:107 #36 0xffffffff82749312 in zap_deref_leaf (zap=zap@entry=0xfffff8035a34d600, h=, tx=tx@entry=0x0, lt=lt@entry=RW_READER, lp=lp@entry=0xfffffe01531a7ab0) at /usr/src/sys/contrib/openzfs/module/zfs/zap.c:623 #37 0xffffffff8274b9a7 in fzap_cursor_retrieve (zap=0xffffffff81e9e1e0 , zc=zc@entry=0xfffffe01531a7aa0, za=za@entry=0xfffffe01531a7b78) at /usr/src/sys/contrib/openzfs/module/zfs/zap.c:1249 #38 0xffffffff82753fde in zap_cursor_retrieve (zc=zc@entry=0xfffffe01531a7aa0, za=za@entry=0xfffffe01531a7b78) at /usr/src/sys/contrib/openzfs/module/zfs/zap_micro.c:1583 #39 0xffffffff825a8a43 in zfs_readdir (vp=, uio=0xfffffe01531a7b60, cr=, eofp=0xfffffe01531a7d04, ncookies=, cookies=0x0) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:1814 #40 zfs_freebsd_readdir (ap=) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:4729 #41 0xffffffff811b8004 in VOP_READDIR_APV (vop=0xffffffff828f3360 , a=a@entry=0xfffffe01531a7d08) at vnode_if.c:1939 #42 0xffffffff80d2a919 in VOP_READDIR (vp=0xfffff8035a1d5540, uio=0xfffffe01531a7d48, cred=, eofflag=0xfffffe01531a7d04, ncookies=0x0, cookies=0x0) at ./vnode_if.h:985 #43 kern_getdirentries (td=, fd=, buf=0x814a05000 , count=4096, basep=basep@entry=0xfffffe01531a7df0, residp=residp@entry=0x0, bufseg=UIO_USERSPACE) at /usr/src/sys/kern/vfs_syscalls.c:4193 #44 0xffffffff80d2ab19 in sys_getdirentries (td=0xffffffff81e9e1e0 , uap=0xfffffe0153346b30) --Type for more, q to quit, c to continue without paging-- at /usr/src/sys/kern/vfs_syscalls.c:4140 #45 0xffffffff810efbae in syscallenter (td=) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189 #46 amd64_syscall (td=0xfffffe0153346740, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1191 #47 #48 0x000000080f1d79fa in ?? () Backtrace stopped: Cannot access memory at address 0x7fffdfdfc418 (kgdb) frame 36 #36 0xffffffff82749312 in zap_deref_leaf (zap=zap@entry=0xfffff8035a34d600, h=, tx=tx@entry=0x0, lt=lt@entry=RW_READER, lp=lp@entry=0xfffffe01531a7ab0) at /usr/src/sys/contrib/openzfs/module/zfs/zap.c:623 623 ASSERT(err || (kgdb) p err $10 = (kgdb) p h $11 = (kgdb) p **lp $13 = {l_dbu = {dbu_tqent = {tqent_task = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0 '\000', ta_flags = 0 '\000', ta_func = 0x0, ta_context = 0x0}, tqent_timeout_task = {q = 0x0, t = {ta_link = { stqe_next = 0x0}, ta_pending = 0, ta_priority = 0 '\000', ta_flags = 0 '\000', ta_func = 0x0, ta_context = 0x0}, c = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = {sle_next = 0x0}, tqe = { tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_precision = 0, c_arg = 0x0, c_func = 0x0, c_lock = 0x0, c_flags = 0, c_iflags = 0, c_cpu = 0}, f = 0}, tqent_func = 0x0, tqent_arg = 0x0, tqent_id = 0, tqent_hash = { cle_next = 0x0, cle_prev = 0x0}, tqent_type = 0 '\000', tqent_registered = 0 '\000', tqent_cancelled = 0 '\000', tqent_rc = 0}, dbu_evict_func_sync = 0xffffffff8274c5d0 , dbu_evict_func_async = 0x0, dbu_clear_on_evict_dbufp = 0xfffff8035d29ca08}, l_rwlock = {lock_object = { lo_name = 0xffffffff82865b9f <.L.str.25+1> "l->l_rwlock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 33}, l_blkid = 182, l_bs = 14, l_dbuf = 0xfffff8035d291b28} (kgdb) info register rax 0x12 18 rbx 0x0 0 rcx 0x1 1 rdx 0xffffffff812b20e7 -2127879961 rsi 0x80 128 rdi 0xffffffff81e9e1e0 -2115378720 rbp 0xfffffe01531a79a0 0xfffffe01531a79a0 rsp 0xfffffe01531a7960 0xfffffe01531a7960 r8 0x0 0 r9 0x1c200001 471859201 r10 0xffffffff81aea0f8 -2119261960 r11 0x20 32 r12 0x2 2 r13 0x0 0 r14 0xfffff8035a34d600 -8781694708224 r15 0x42 66 rip 0xffffffff82749312 0xffffffff82749312 eflags 0x86 [ PF SF ] cs 0x20 32 ss 0x28 40 ds es fs gs fs_base gs_base (kgdb) disassemble zap_deref_leaf Dump of assembler code for function zap_deref_leaf: 0xffffffff82749220 <+0>: push %rbp 0xffffffff82749221 <+1>: mov %rsp,%rbp 0xffffffff82749224 <+4>: push %r15 0xffffffff82749226 <+6>: push %r14 0xffffffff82749228 <+8>: push %r13 0xffffffff8274922a <+10>: push %r12 0xffffffff8274922c <+12>: push %rbx 0xffffffff8274922d <+13>: sub $0x18,%rsp 0xffffffff82749231 <+17>: mov %ecx,%r12d 0xffffffff82749234 <+20>: mov 0xe8(%rdi),%rax 0xffffffff8274923b <+27>: mov 0x18(%rax),%rax 0xffffffff8274923f <+31>: movabs $0x8000000000000001,%rcx 0xffffffff82749249 <+41>: cmp %rcx,(%rax) 0xffffffff8274924c <+44>: jg 0xffffffff82749314 0xffffffff82749252 <+50>: movabs $0x2f52ab2ab,%rcx 0xffffffff8274925c <+60>: cmp %rcx,0x8(%rax) 0xffffffff82749260 <+64>: jne 0xffffffff82749314 0xffffffff82749266 <+70>: mov %rdx,%r13 0xffffffff82749269 <+73>: mov %rsi,%r15 0xffffffff8274926c <+76>: mov %rdi,%r14 0xffffffff8274926f <+79>: mov %r8,-0x30(%rbp) 0xffffffff82749273 <+83>: mov 0x20(%rax),%rax 0xffffffff82749277 <+87>: mov %eax,%ecx 0xffffffff82749279 <+89>: neg %cl 0xffffffff8274927b <+91>: shr %cl,%rsi 0xffffffff8274927e <+94>: test %rax,%rax 0xffffffff82749281 <+97>: cmove %rax,%rsi 0xffffffff82749285 <+101>: lea -0x38(%rbp),%rdx 0xffffffff82749289 <+105>: call 0xffffffff8274aae0 0xffffffff8274928e <+110>: mov %eax,%ebx 0xffffffff82749290 <+112>: test %eax,%eax 0xffffffff82749292 <+114>: jne 0xffffffff8274933a 0xffffffff82749298 <+120>: mov -0x38(%rbp),%rsi 0xffffffff8274929c <+124>: mov %r14,%rdi 0xffffffff8274929f <+127>: mov %r13,%rdx 0xffffffff827492a2 <+130>: mov %r12d,%ecx 0xffffffff827492a5 <+133>: mov -0x30(%rbp),%r8 0xffffffff827492a9 <+137>: call 0xffffffff8274bf60 0xffffffff827492ae <+142>: mov %eax,%ebx 0xffffffff827492b0 <+144>: test %eax,%eax 0xffffffff827492b2 <+146>: jne 0xffffffff8274933a 0xffffffff827492b8 <+152>: mov -0x30(%rbp),%rax 0xffffffff827492bc <+156>: mov (%rax),%rax 0xffffffff827492bf <+159>: mov 0x108(%rax),%rax 0xffffffff827492c6 <+166>: mov 0x18(%rax),%rax 0xffffffff827492ca <+170>: mov 0x20(%rax),%edx 0xffffffff827492cd <+173>: mov %edx,%ecx 0xffffffff827492cf <+175>: neg %cl 0xffffffff827492d1 <+177>: shr %cl,%r15 0xffffffff827492d4 <+180>: xor %ebx,%ebx 0xffffffff827492d6 <+182>: test %dx,%dx 0xffffffff827492d9 <+185>: mov $0x0,%ecx 0xffffffff827492de <+190>: cmovne %r15,%rcx 0xffffffff827492e2 <+194>: cmp 0x10(%rax),%rcx 0xffffffff827492e6 <+198>: je 0xffffffff8274933a 0xffffffff827492e8 <+200>: xor %ebx,%ebx 0xffffffff827492ea <+202>: mov $0xffffffff82869886,%rdi 0xffffffff827492f1 <+209>: mov $0xffffffff8282e438,%rsi 0xffffffff827492f8 <+216>: mov $0x271,%edx 0xffffffff827492fd <+221>: mov $0xffffffff82835173,%rcx 0xffffffff82749304 <+228>: mov $0xffffffff8283164f,%r8 0xffffffff8274930b <+235>: xor %eax,%eax 0xffffffff8274930d <+237>: call 0xffffffff825835e0 => 0xffffffff82749312 <+242>: jmp 0xffffffff8274933a 0xffffffff82749314 <+244>: mov 0x1a5f4d(%rip),%rdi # 0xffffffff828ef268 0xffffffff8274931b <+251>: mov $0x5,%ebx 0xffffffff82749320 <+256>: test %rdi,%rdi 0xffffffff82749323 <+259>: je 0xffffffff8274933a --Type for more, q to quit, c to continue without paging-- 0xffffffff82749325 <+261>: mov $0x5,%esi 0xffffffff8274932a <+266>: xor %edx,%edx 0xffffffff8274932c <+268>: xor %ecx,%ecx 0xffffffff8274932e <+270>: xor %r8d,%r8d 0xffffffff82749331 <+273>: xor %r9d,%r9d 0xffffffff82749334 <+276>: call *-0xc364d2(%rip) # 0xffffffff81b12e68 0xffffffff8274933a <+282>: mov %ebx,%eax 0xffffffff8274933c <+284>: add $0x18,%rsp 0xffffffff82749340 <+288>: pop %rbx 0xffffffff82749341 <+289>: pop %r12 0xffffffff82749343 <+291>: pop %r13 0xffffffff82749345 <+293>: pop %r14 0xffffffff82749347 <+295>: pop %r15 0xffffffff82749349 <+297>: pop %rbp 0xffffffff8274934a <+298>: ret End of assembler dump.