Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
karl at denninger.net
Wed Apr 10 14:38:59 UTC 2019
On 4/10/2019 08:45, Andriy Gapon wrote:
> On 10/04/2019 04:09, Karl Denninger wrote:
>> Specifically, I *explicitly* OFFLINE the disk in question, which is a
>> controlled operation and *should* result in a cache flush out of the ZFS
>> code into the drive before it is OFFLINE'd.
>> This should result in the "last written" TXG that the remaining online
>> members have, and the one in the offline member, being consistent.
>> Then I "camcontrol standby" the involved drive, which forces a writeback
>> cache flush and a spindown; in other words, re-ordered or not, the
>> on-platter data *should* be consistent with what the system thinks
>> happened before I yank the physical device.
> This may not be enough for a specific [RAID] controller and a specific
> configuration. It should be enough for a dumb HBA. But, for example, mrsas(9)
> can simply ignore the synchronize cache command (meaning neither the on-board
> cache is flushed nor the command is propagated to a disk). So, if you use some
> advanced controller it would make sense to use its own management tool to
> offline a disk before pulling it.
> I do not preclude a possibility of an issue in ZFS. But it's not the only
> possibility either.
In this specific case the adapter in question is...
mps0: <Avago Technologies (LSI) SAS2116> port 0xc000-0xc0ff mem
0xfbb3c000-0xfbb3ffff,0xfbb40000-0xfbb7ffff irq 30 at device 0.0 on pci3
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects
his drives via dumb on-MoBo direct SATA connections.
What I don't know (yet) is if the update to firmware 20.00.07.00 in the
HBA has fixed it. The 11.2 and 12.0 revs of FreeBSD through some
mechanism changed timing quite materially in the mps driver; prior to
11.2 I ran with a Lenovo SAS expander connected to SATA disks without
any problems at all, even across actual disk failures through the years,
but in 11.2 and 12.0 doing this resulted in spurious retries out of the
CAM layer that allegedly came from timeouts on individual units (which
looked very much like a lost command sent to the disk), but only on
mirrored volume sets -- yet there were no errors reported by the drive
itself, nor did either of my RaidZ2 pools (one spinning rust, one SSD)
experience problems of any sort. Flashing the HBA forward to
20.00.07.00 with the expander in resulted in the *driver* (mps) taking
disconnects and resets instead of the targets, which in turn caused
random drive fault events across all of the pools. For obvious reasons
that got backed out *fast*.
Without the expander 19.00.00.00 has been stable over the last few
months *except* for this circumstance, where an intentionally OFFLINE'd
disk in a mirror that is brought back online after some reasonably long
period of time (days to a week) results in a successful resilver but
then a small number of checksum errors on that drive -- always on the
one that was OFFLINEd, never on the one(s) not taken OFFLINE -- appear
and are corrected when a scrub is subsequently performed. I am now on
20.00.07.00 and so far -- no problems. But I've yet to do the backup
disk swap on 20.00.07.00 (scheduled for late week or Monday) so I do not
know if the 20.00.07.00 roll-forward addresses the scrub issue or not.
I have no reason to believe it is involved, but given the previous
"iffy" nature of 11.2 and 12.0 on 19.0 with the expander it very well
might be due to what appear to be timing changes in the driver architecture.
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4897 bytes
Desc: S/MIME Cryptographic Signature
More information about the freebsd-stable