Re: crash zfs_clone_range()

From: Martin Matuska <mm_at_FreeBSD.org>
Date: Fri, 10 Nov 2023 20:32:47 UTC
Hi Ronald,

hitting the panic with a DEBUG kernel would be great and it would be 
very nice if I could somehow reproduce the panic.
I have the option to rent an cheap arm64 virtual host at Hetzner so I 
could test that at an environment close to yours.

Please try compiling a GENERIC-DEBUG kernel with:

include GENERIC

ident GENERIC-DEBUG

options         INVARIANTS
options         INVARIANT_SUPPORT
options         WITNESS
options         WITNESS_SKIPSPIN
options         DEBUG_LOCKS
options         DEBUG_VFS_LOCKS
options         DIAGNOSTIC
options         DDB

Cheers,
mm

On 10. 11. 2023 11:12, Ronald Klop wrote:
> Hi,
>
> Had this crash today on RPI4/15-CURRENT.
>
> FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 
> main-b0203aaa46-dirty: Sat Nov  4 11:48:33 CET 2023 
> ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG 
> arm64
>
> $ sysctl -a | grep bclon
> vfs.zfs.bclone_enabled: 1
>
> I started a jail with poudriere to build a package. The jail uses null 
> mounts over ZFS.
>
> [root]# cu -s 115200 -l /dev/cuaU0
> Connected
>
> db> bt
> Tracing pid 95213 tid 100438 td 0xffff0000e1e97900
> db_trace_self() at db_trace_self
> db_stack_trace() at db_stack_trace+0x120
> db_command() at db_command+0x2e4
> db_command_loop() at db_command_loop+0x58
> db_trap() at db_trap+0x100
> kdb_trap() at kdb_trap+0x334
> handle_el1h_sync() at handle_el1h_sync+0x18
> --- exception, esr 0xf2000000
> kdb_enter() at kdb_enter+0x48
> vpanic() at vpanic+0x1dc
> panic() at panic+0x48
> data_abort() at data_abort+0x2fc
> handle_el1h_sync() at handle_el1h_sync+0x18
> --- exception, esr 0x96000004
> rms_rlock() at rms_rlock+0x1c
> zfs_clone_range() at zfs_clone_range+0x68
> zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c
> null_bypass() at null_bypass+0x118
> vn_copy_file_range() at vn_copy_file_range+0x18c
> kern_copy_file_range() at kern_copy_file_range+0x36c
> sys_copy_file_range() at sys_copy_file_range+0x8c
> do_el0_sync() at do_el0_sync+0x634
> handle_el0_sync() at handle_el0_sync+0x48
> --- exception, esr 0x56000000
>
>
> Oh.. While typing this I rebooted the machine and it happened again. I 
> didn't start anything in particular although the machine runs some jails.
>
> x0: 0x00000000000000e0
>   x1: 0xffffa00090317a48
>   x2: 0xffffa000f79d4f00
>   x3: 0xffffa000c61a44a8
>   x4: 0xffff0000deefe460 ($d.2 + 0xdd776560)
>   x5: 0xffffa001250e4c00
>   x6: 0xffff0000e54025b5 ($d.5 + 0xc)
>   x7: 0x000000000000030a
>   x8: 0xffff0000e1559000 ($d.2 + 0xdfdd1100)
>   x9: 0x0000000000000001
>  x10: 0x0000000000000000
>  x11: 0x0000000000000001
>  x12: 0x0000000000000002
>  x13: 0x0000000000000000
>  x14: 0x0000000000000001
>  x15: 0x0000000000000000
>  x16: 0xffff0000016dce88 (__stop_set_modmetadata_set + 0x1310)
>  x17: 0xffff0000004e0d44 (rms_rlock + 0x0)
>  x18: 0xffff0000deefe280 ($d.2 + 0xdd776380)
>  x19: 0x0000000000000000
>  x20: 0xffff0000deefe460 ($d.2 + 0xdd776560)
>  x21: 0x7fffffffffffffff
>  x22: 0xffffa00090317a48
>  x23: 0xffffa000f79d4f00
>  x24: 0xffffa001067ef910
>  x25: 0x00000000000000e0
>  x26: 0xffffa000158a8000
>  x27: 0x0000000000000000
>  x28: 0xffffa000158a8000
>  x29: 0xffff0000deefe280 ($d.2 + 0xdd776380)
>   sp: 0xffff0000deefe280
>   lr: 0xffff000001623564 (zfs_clone_range + 0x6c)
>  elr: 0xffff0000004e0d60 (rms_rlock + 0x1c)
> spsr: 0x00000000a0000045
>  far: 0x0000000000000108
>  esr: 0x0000000096000004
> panic: data abort in critical section or under mutex
> cpuid = 1
> time = 1699610885
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x38
> vpanic() at vpanic+0x1a0
> panic() at panic+0x48
> data_abort() at data_abort+0x2fc
> handle_el1h_sync() at handle_el1h_sync+0x18
> --- exception, esr 0x96000004
> rms_rlock() at rms_rlock+0x1c
> zfs_clone_range() at zfs_clone_range+0x68
> zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c
> null_bypass() at null_bypass+0x118
> vn_copy_file_range() at vn_copy_file_range+0x18c
> kern_copy_file_range() at kern_copy_file_range+0x36c
> sys_copy_file_range() at sys_copy_file_range+0x8c
> do_el0_sync() at do_el0_sync+0x634
> handle_el0_sync() at handle_el0_sync+0x48
> --- exception, esr 0x56000000
> KDB: enter: panic
> [ thread pid 3792 tid 100394 ]
> Stopped at      kdb_enter+0x48: str     xzr, [x19, #768]
> db>
>
> I'll keep the debugger open for a while. Can I type something for 
> additional info?
>
> Regards,
> Ronald.