[Bug 277992] mpr and possible trim issues

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 05 Apr 2024 21:28:12 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277992

--- Comment #2 from Warner Losh <imp@FreeBSD.org> ---
"Power on reset" sense code means that the drive dropped off the bus
(typically). Usually if you can eliminate flakey power (either bad connectors,
or inadequate power to cope with maximum power draw), then you are left with
"commands sent to the drive freaked it out". I always bet at least a nickel on
'TRIM'.

If you think that this is TRIM related, then you can try disabling TRIM at the
da device level by changing the delete method to 'none':
kern.cam.da.0.delete_method: NONE

It may also be too large a TRIM, you can see what delete_max is, and try
reducing it (I usually go by 1/2 when looking for problems like this). I've
never had to do this, but it is tunable, and 'too large' is a known issue, or
used to be years ago:
kern.cam.da.0.delete_max: 1048576

You can do the trim type at normal to trim everything, and then you can turn it
off in the drive to do the zfs send / receive. For testing purposes, trim
doesn't matter in terms of drive life (it only matters when you write to the
drive day in / day out and keep some parts of the drive unused for more than,
say, a few hours or a day).

It may also be that weird commands are being sent by something like smartd,
etc. If you can reproduce this, then we may be able to adopt a prototype dtrace
script I have to make a tcpdump-like script. We can use it to dump the last N
commands when we get a unit attention.

Now, having said all this, I think that this sense code might be mishandled
right now. Regardless of how we got here, I think that this should pause the
I/O, and reset the parameters we've set in the drive to reset its state to what
the driver expects (though I think only the write through cache and similar
settings might matter).

-- 
You are receiving this mail because:
You are the assignee for the bug.