seeing data corruption with zfs trim functionality

Ajit Jain ajit.jain at cloudbyte.com
Thu May 23 06:10:00 UTC 2013


Hi Steven,

FW version on the setup is P15.
I will upgrade the FW to P16, but I think my
best bet will be to update code base to 9 stable as unlike you,
I was seeing corruption for all three delete methods.

thanks
ajit

On Sat, May 18, 2013 at 4:15 AM, Steven Hartland <killing at multiplay.co.uk>wrote:

> ----- Original Message ----- From: "Steven Hartland" <
> killing at multiplay.co.uk>
>
>
>> After initially seeing not issues, our overnight monitoring started
>> moaning
>> big time on the test box. So we checked and there was zpool corruption as
>> well
>> as a missing boot loader and a corrupt GPT, so I believe we have
>> reproduced
>> your issue.
>>
>> After recovering the machine I created 3 pools on 3 different disks each
>> running a different delete_method.
>>
>> We then re-ran the tests which resulted in the pool running with
>> delete_method
>> WS16 being so broken it had suspended IO. A reboot resulted in it once
>> again
>> reporting no partition table via gpart.
>>
>> A third test run again produced a corrupt pool for WS16.
>>
>> I've conducted a preliminary review of the CAM WS16 code path along with
>> SBC-3
>> spec which didn't identify any obvious issues.
>>
>> Given we're both using LSI 2008 based controllers it could be FW issue
>> specific
>> to WS16 but that's just speculation atm, so I'll continue to investigate.
>>
>> If you could re-test you end without using WS16 to see if you can
>> reproduce the
>> problem with either UNMAP or ATA_TRIM that would be a very useful data
>> point.
>>
>
> After much playing I narrow down a test case of one delete which was
> causing
> disc corruption for us (deleted the partition table instead of data in
> the middle of the disk).
>
> The conclusion is LSI 2008 HBA with FW below P13 will eat the data on your
> SATA
> disks if you use WS16 due to the following bug:-
> SCGCQ00230159 (DFCT) - Write same command to a SATA drive that doesn't
> support
> SCT write same may write wrong region.
>
> After updating here to P16, which we would generally be running, but test
> box
> was new and hadnt updated yet the corruption issue is no longer
> reproducable.
>
> So Ajit please check your FW version, I'm hoping to here your on something
> below P13, P12 possibly?
>
> If so then this is your issue, to fix simply update to P16 and the problem
> should be gone.
>
>
>    Regards
>    Steve
>
>
> ==============================**==================
> This e.mail is private and confidential between Multiplay (UK) Ltd. and
> the person or entity to whom it is addressed. In the event of misdirection,
> the recipient is prohibited from using, copying, printing or otherwise
> disseminating it or any information contained in it.
> In the event of misdirection, illegible or incomplete transmission please
> telephone +44 845 868 1337
> or return the E.mail to postmaster at multiplay.co.uk.
>
>


More information about the freebsd-fs mailing list