[Bug 270089] mpr: panic in mpr_complete_command during zpool import

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 08 Apr 2024 18:22:18 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270089

--- Comment #19 from Warner Losh <imp@FreeBSD.org> ---
A few problems with your ncq trim theory.

(1) scsi_da doesn't implement ncq trim at all. Until mpi3mr was imported, there
was nothing in the tree that could create the necessary ATA command with the
extra registers apart from ahci (which uses ata_da).
(2) mpr can't possibly send ncq trims if da were generating them
and
(3) import is a read intensive operation, so is not doing trims and will do
minimal writes.

As an aside: We likely should just assume the 4k quirk always. That would
eliminate 90% of the quirks we have if we also stop doing READ6/WRITE6 commands
entirely (they are a compat hack for SASI and READ10 was in SCSI1, though not
universally working, SCSI1 drives are not relevant today, certainly not
ultra-low capacity ones that were quirky at the time). But that's a different
issue. But that's not the main issue here.

I fixed a lot of 'state machine' bugs, which this panic as, and Scott Long
fixed even more before I did. Those changes should have been pushed upstream
several years prior to the uname date in this bug report. Since this is on an
ARM server, there may be something subtle there due to arm's weaker memory
model than amd64 that's causing this. My testing of mpr on aarch64 has been
light since we don't use it at $WORK and my aarch64 chassis that I have don't
have slots for hard drives... So I've just done bench testing to see that I Can
see the disk and do some I/O, but not much beyond that. And of late it's not
feasible to redo that bench testing due to changes in the amount of junk I have
on my bench.

Out of curiosity: is this a zpool import from a pool that was created on
another system? Or was it working fine and then this started happening after
some upgrade.

mpr and mps both share a common history, including the state tracking code, so
it's not super surprising that this is being hit on both.

-- 
You are receiving this mail because:
You are the assignee for the bug.