[Bug 270089] mpr: panic in mpr_complete_command during zpool import

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 10 Apr 2024 08:58:24 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270089

--- Comment #20 from Dan Kotowski <dan.kotowski@a9development.com> ---
> Since this is on an ARM server, there may be something subtle there due to arm's weaker memory model than amd64 that's causing this.

I don't know if it's related or not, but PCIe GPUs under Linux DRM can
experience weird artifacting and tearing as a result of some sort of memory
issue. The fix from the firmware developer was to reorder operations in glibc
memcpy.

> Subject: [PATCH] Aarch64: Make memcpy more compatible with device memory
> 
> For normal non-cacheable memory ACE supports 4x128 bit r/w WRAP
> transfers or 1x128 bit r/w INCR transfers.  By re-ordering the
> stp's in memcpy / memmove we can accomodate this better without
> impacting the existing code.
> 
> This fixes an issue seen on multiple Cortex-A72 SOCs when writing
> directly to a PCIe memmapped frame-buffer, which resulted in
> corruption.
https://gist.github.com/jnettlet/f6f8b49bb7c731255c46f541f875f436

Test for framebuffer memcpy bugs:
https://gist.github.com/jnettlet/80f8d09d01c0dc0ffc0122f36ed78de6

Unfortunately I lack the knowledge to know how to build a test util for
FreeBSD, but I do wonder if the coherency issue on the bus is impacting my case
as well?

Another user in the vendor's Discord channel recently stated that they've been
experiencing issues with using 2x NVMe drives off of a SuperMicro bifurcated
PCIe-to-NVMe adapter. And I've since been able to replicate the issues using a
Linux mdadm mirror as well.

I have not seen panics when testing single drives, only when using pools of 2
or more.

Perhaps it only presents when there are multiple commands issued in parallel?

> is this a zpool import from a pool that was created on another system?

I have been able to reproduce with zpools from known-working systems and trying
to create new ones as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.