From nobody Tue Nov 14 17:39:31 2023 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SVD7p4gFhz51cX4 for ; Tue, 14 Nov 2023 17:39:34 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-oo1-xc2d.google.com (mail-oo1-xc2d.google.com [IPv6:2607:f8b0:4864:20::c2d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SVD7p2DNbz3XFZ; Tue, 14 Nov 2023 17:39:34 +0000 (UTC) (envelope-from mjguzik@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-oo1-xc2d.google.com with SMTP id 006d021491bc7-586ad15f9aaso3001168eaf.2; Tue, 14 Nov 2023 09:39:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699983572; x=1700588372; darn=freebsd.org; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=dNZP2p4/JVoYw6TKkLaqHCaFa0CV8rJGX2CphEKzJMw=; b=dzXYbeNjQb7Xyd0C3AssMtkon2JipPCZE1zlO9CHKPiGTpAhimp7e/oUk/uwif7Iji l0klVCyqDZPBNXRzAqK7XuidsRUiHWTFTk4ourdyUy5VCscyt2mNUB3mscTkctKufGaj 8y1+T65c5wa5vkKHeszpzLQZjAKqAm37GrJb8x9IqUdbaUzUpSzyt4cbG0EvIYV7rWlQ /m1IcxRyJUZL4sq4XPt01txMwWioMQCnGh97WRjtgWUfxMlu4i3FJ/AKSU139Ideejbt QyxpdnhZZFhwUZvPRVNYxwbPXDgnCBOdVnpf81wqEz/4925OMPoAPWal7+yOQcoqEexN ZVZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699983572; x=1700588372; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dNZP2p4/JVoYw6TKkLaqHCaFa0CV8rJGX2CphEKzJMw=; b=CHlvExuUDfaucnQ5aOARE7grhD1Mn6qUnkRBsZDQKNxJTIWf2xOxlpfcIN191sdU8L mitAzEyw0BJJ4uQHa3L6cdUJF5To8/ln95ISDKK2MzejTIYUV1+T6yeqRC0LgPmk1fLC UfGMaOGPvEcQrbD+bTtyKhWDkO0+75DQbpIcqq79kjGbn20/E+tCWCAHDaot/OCUnV1l ZGTo57tWdXHZ24YetGAMzCwxLAmUGE0I9Fg2FDR0enSqm54XAK+uBX58+MwLhPQ4/wHX GaVsC5wbCVsUGVieq5Po3YD/f9mZS4Vhdt99+4fVxWxc2egMcpVOouE3Nt+Mkfdm4bdq IFWg== X-Gm-Message-State: AOJu0Yxka7EUpQZn8a36oI83GlT5Eq5+jlX2cLK7oAh7e42/BjBvTerz XEWzZ3F4wlbENL1pcKBtvKMd1823046dG7Ihle8= X-Google-Smtp-Source: AGHT+IHdZHUTCnL0bhlN6jZEuiX5DzBvq3JtBJLkPp8jgDbRadkmHYL09pFcBUakhjw0BxSrQLw05LHYmfAubuAFS/s= X-Received: by 2002:a4a:bb0f:0:b0:581:f2de:25f8 with SMTP id f15-20020a4abb0f000000b00581f2de25f8mr2003093oop.0.1699983572491; Tue, 14 Nov 2023 09:39:32 -0800 (PST) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Received: by 2002:ac9:57d4:0:b0:4f0:1250:dd51 with HTTP; Tue, 14 Nov 2023 09:39:31 -0800 (PST) In-Reply-To: <1900239445.5968.1699966796547@localhost> References: <349700057.3452.1699611152405@localhost> <1900239445.5968.1699966796547@localhost> From: Mateusz Guzik Date: Tue, 14 Nov 2023 18:39:31 +0100 Message-ID: Subject: Re: crash zfs_clone_range() To: Ronald Klop Cc: Konstantin Belousov , Alexander Motin , current@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4SVD7p2DNbz3XFZ On 11/14/23, Ronald Klop wrote: > Response below > > Van: Konstantin Belousov > Datum: zondag, 12 november 2023 19:47 > Aan: Alexander Motin > CC: Ronald Klop , current@freebsd.org > Onderwerp: Re: crash zfs_clone_range() >> >> On Sun, Nov 12, 2023 at 11:51:40AM -0500, Alexander Motin wrote: >> > Hi Ronald, >> > >> > As I can see, the clone request to ZFS came through nullfs, and it >> > crashed >> > immediately on enter. I've never been a VFS layer expert, but to me it >> > may >> > be a nullfs problem, not zfs. Is there chance you was (un-)mounting >> > something when this happened? >> It is not nullfs issue, I believe, but the lack of the busy reference on >> the >> upper mount. I think https://reviews.freebsd.org/D42554 should cover it. >> >> > >> > On 10.11.2023 05:12, Ronald Klop wrote: >> > > Hi, >> > > >> > > Had this crash today on RPI4/15-CURRENT. >> > > >> > > FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 >> > > main-b0203aaa46-dirty: Sat Nov 4 11:48:33 CET 2023 >> > > ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG >> > > arm64 >> > > >> > > $ sysctl -a | grep bclon >> > > vfs.zfs.bclone_enabled: 1 >> > > >> > > I started a jail with poudriere to build a package. The jail uses >> > > null >> > > mounts over ZFS. >> > > >> > > [root]# cu -s 115200 -l /dev/cuaU0 >> > > Connected >> > > >> > > db> bt >> > > Tracing pid 95213 tid 100438 td 0xffff0000e1e97900 >> > > db_trace_self() at db_trace_self >> > > db_stack_trace() at db_stack_trace+0x120 >> > > db_command() at db_command+0x2e4 >> > > db_command_loop() at db_command_loop+0x58 >> > > db_trap() at db_trap+0x100 >> > > kdb_trap() at kdb_trap+0x334 >> > > handle_el1h_sync() at handle_el1h_sync+0x18 >> > > --- exception, esr 0xf2000000 >> > > kdb_enter() at kdb_enter+0x48 >> > > vpanic() at vpanic+0x1dc >> > > panic() at panic+0x48 >> > > data_abort() at data_abort+0x2fc >> > > handle_el1h_sync() at handle_el1h_sync+0x18 >> > > --- exception, esr 0x96000004 >> > > rms_rlock() at rms_rlock+0x1c >> > > zfs_clone_range() at zfs_clone_range+0x68 >> > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c >> > > null_bypass() at null_bypass+0x118 >> > > vn_copy_file_range() at vn_copy_file_range+0x18c >> > > kern_copy_file_range() at kern_copy_file_range+0x36c >> > > sys_copy_file_range() at sys_copy_file_range+0x8c >> > > do_el0_sync() at do_el0_sync+0x634 >> > > handle_el0_sync() at handle_el0_sync+0x48 >> > > --- exception, esr 0x56000000 >> > > >> > > >> > > Oh.. While typing this I rebooted the machine and it happened again. >> > > I >> > > didn't start anything in particular although the machine runs some >> > > jails. >> > > >> > > x0: 0x00000000000000e0 >> > > x1: 0xffffa00090317a48 >> > > x2: 0xffffa000f79d4f00 >> > > x3: 0xffffa000c61a44a8 >> > > x4: 0xffff0000deefe460 ($d.2 + 0xdd776560) >> > > x5: 0xffffa001250e4c00 >> > > x6: 0xffff0000e54025b5 ($d.5 + 0xc) >> > > x7: 0x000000000000030a >> > > x8: 0xffff0000e1559000 ($d.2 + 0xdfdd1100) >> > > x9: 0x0000000000000001 >> > > x10: 0x0000000000000000 >> > > x11: 0x0000000000000001 >> > > x12: 0x0000000000000002 >> > > x13: 0x0000000000000000 >> > > x14: 0x0000000000000001 >> > > x15: 0x0000000000000000 >> > > x16: 0xffff0000016dce88 (__stop_set_modmetadata_set + 0x1310) >> > > x17: 0xffff0000004e0d44 (rms_rlock + 0x0) >> > > x18: 0xffff0000deefe280 ($d.2 + 0xdd776380) >> > > x19: 0x0000000000000000 >> > > x20: 0xffff0000deefe460 ($d.2 + 0xdd776560) >> > > x21: 0x7fffffffffffffff >> > > x22: 0xffffa00090317a48 >> > > x23: 0xffffa000f79d4f00 >> > > x24: 0xffffa001067ef910 >> > > x25: 0x00000000000000e0 >> > > x26: 0xffffa000158a8000 >> > > x27: 0x0000000000000000 >> > > x28: 0xffffa000158a8000 >> > > x29: 0xffff0000deefe280 ($d.2 + 0xdd776380) >> > > sp: 0xffff0000deefe280 >> > > lr: 0xffff000001623564 (zfs_clone_range + 0x6c) >> > > elr: 0xffff0000004e0d60 (rms_rlock + 0x1c) >> > > spsr: 0x00000000a0000045 >> > > far: 0x0000000000000108 >> > > esr: 0x0000000096000004 >> > > panic: data abort in critical section or under mutex >> > > cpuid = 1 >> > > time = 1699610885 >> > > KDB: stack backtrace: >> > > db_trace_self() at db_trace_self >> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x38 >> > > vpanic() at vpanic+0x1a0 >> > > panic() at panic+0x48 >> > > data_abort() at data_abort+0x2fc >> > > handle_el1h_sync() at handle_el1h_sync+0x18 >> > > --- exception, esr 0x96000004 >> > > rms_rlock() at rms_rlock+0x1c >> > > zfs_clone_range() at zfs_clone_range+0x68 >> > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c >> > > null_bypass() at null_bypass+0x118 >> > > vn_copy_file_range() at vn_copy_file_range+0x18c >> > > kern_copy_file_range() at kern_copy_file_range+0x36c >> > > sys_copy_file_range() at sys_copy_file_range+0x8c >> > > do_el0_sync() at do_el0_sync+0x634 >> > > handle_el0_sync() at handle_el0_sync+0x48 >> > > --- exception, esr 0x56000000 >> > > KDB: enter: panic >> > > [ thread pid 3792 tid 100394 ] >> > > Stopped at kdb_enter+0x48: str xzr, [x19, #768] >> > > db> >> > > >> > > I'll keep the debugger open for a while. Can I type something for >> > > additional info? >> > > >> > > Regards, >> > > Ronald. >> > >> > -- >> > Alexander Motin >> >> >> > > > Hi, > > Build a new kernel today. > FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #20 main-051d69d6f8: Tue Nov > 14 12:16:28 CET 2023 > ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG > arm64 > > x0: 0x00000000000008e0 > > x1: 0xffffa0006ce1fb38 > > x2: 0xffffa0006837a400 > > x3: 0xffffa0012c503a48 > > x4: 0xffff0000eb0ef430 (next_index + 0x815d790) > > x5: 0xffffa00152636300 > > x6: 0xffff0000e2e025b5 ($d.5 + 0xc) > > x7: 0x000000000000030a > > x8: 0xffff0000eb3212c0 (next_index + 0x838f620) > > x9: 0x0000000000000001 > > x10: 0x0000000000000000 > > x11: 0x0000000000000001 > > x12: 0x0000000000000002 > > x13: 0x0000000000000000 > > x14: 0x0000000000000001 > > x15: 0x0000000000000000 > x16: 0xffff0000016e5b58 (__stop_set_modmetadata_set + 0x1328) > x17: 0xffff0000004e0c28 (rms_rlock + 0x0) > x18: 0xffff0000eb0ef250 (next_index + 0x815d5b0) > x19: 0x0000000000000800 > x20: 0xffff0000eb0ef430 (next_index + 0x815d790) > x21: 0x7fffffffffffffff > x22: 0xffffa0006ce1fb38 > x23: 0xffffa0006837a400 > x24: 0xffffa001ee486000 > x25: 0x00000000000008e0 > x26: 0xffffa000135ca000 > x27: 0x0000000000000800 > x28: 0xffffa000135ca000 > x29: 0xffff0000eb0ef250 (next_index + 0x815d5b0) > sp: 0xffff0000eb0ef250 > lr: 0xffff00000162bee8 (zfs_clone_range + 0x6c) > elr: 0xffff0000004e0c44 (rms_rlock + 0x1c) > spsr: 0x00000000a0000045 > far: 0x0000000000000908 > esr: 0x0000000096000004 > panic: data abort in critical section or under mutex > cpuid = 2 > time = 1699966486 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x38 > vpanic() at vpanic+0x1a0 > panic() at panic+0x48 > data_abort() at data_abort+0x2fc > handle_el1h_sync() at handle_el1h_sync+0x18 > --- exception, esr 0x96000004 > rms_rlock() at rms_rlock+0x1c > zfs_clone_range() at zfs_clone_range+0x68 > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c > null_bypass() at null_bypass+0x118 > vn_copy_file_range() at vn_copy_file_range+0x1c0 > kern_copy_file_range() at kern_copy_file_range+0x36c > sys_copy_file_range() at sys_copy_file_range+0x8c > do_el0_sync() at do_el0_sync+0x634 > handle_el0_sync() at handle_el0_sync+0x48 > --- exception, esr 0x56000000 > KDB: enter: panic > [ thread pid 3620 tid 100911 ] > Stopped at kdb_enter+0x48: str xzr, [x19, #768] > db> > > This happens as soon as I start poudriere in a jenkins-agent jail. > > AFAIK this includes the two recent vn_copy_file_range changes of > Konstantin. > > Next I will install a GENERIC kernel instead of GENERIC-NODEBUG. > One of the vnodes is probably not zfs, I suspect this will do it (untested): diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c index 107cd69c756c..e799a7091b8e 100644 --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap) goto bad_write_fallback; } } + + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { + goto bad_write_fallback; + } + if (invp == outvp) { if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { goto bad_write_fallback; -- Mateusz Guzik