seeing data corruption with zfs trim functionality

Ajit Jain ajit.jain at cloudbyte.com
Wed May 29 12:43:35 UTC 2013


Hi Steven,

Sorry for the long delay, but might delay even further.
I think the reason for the corruption was, my code
was not updated specially cam directory.

I request please do not stop just because of the issue I reported.
I'll update my src tree and rerun the experiments I was running
if I see some issue then probably we fix the bug rather then stopping
for MFC.

thanks,
ajit



On Wed, May 29, 2013 at 5:19 PM, Steven Hartland <killing at multiplay.co.uk>wrote:

> Sorry to pester, but any update on this Ajit?
>
> I ask as its currently blocking the MFC of TRIM to stable/8 & 9 and I've
> been
> unable to reproduce this issue even with your testing code on working FW
> versions.
>
>
>    Regards
>    Steve
>
> ----- Original Message ----- From: "Ajit Jain" <ajit.jain at cloudbyte.com>
>
>
>  Sure Steven,
>> I'll apply the patches and update ASAP.
>>
>> thanks
>> ajit
>>
>>
>> On Thu, May 23, 2013 at 3:03 PM, Steven Hartland <killing at multiplay.co.uk
>> >**wrote:
>>
>>  I've attacked the two patch sets I'm looking to MFC to stable-9, one
>>> adds BIO_DELETE CAM changes and the other is ZFS TRIM support.
>>>
>>> They should both apply cleanly to stable-9, if you could test with
>>> those on your machine and let me know.
>>>
>>>    Regards
>>>    Steve
>>>
>>> ----- Original Message ----- From: "Ajit Jain" <ajit.jain at cloudbyte.com>
>>>
>>>
>>>  Hi Steven,
>>>
>>>>
>>>> FW version on the setup is P15.
>>>> I will upgrade the FW to P16, but I think my
>>>> best bet will be to update code base to 9 stable as unlike you,
>>>> I was seeing corruption for all three delete methods.
>>>>
>>>> thanks
>>>> ajit
>>>>
>>>> On Sat, May 18, 2013 at 4:15 AM, Steven Hartland <
>>>> killing at multiplay.co.uk
>>>> >**wrote:
>>>>
>>>>
>>>>  ----- Original Message ----- From: "Steven Hartland" <
>>>>
>>>>> killing at multiplay.co.uk>
>>>>>
>>>>>
>>>>>  After initially seeing not issues, our overnight monitoring started
>>>>>
>>>>>> moaning
>>>>>> big time on the test box. So we checked and there was zpool corruption
>>>>>> as
>>>>>> well
>>>>>> as a missing boot loader and a corrupt GPT, so I believe we have
>>>>>> reproduced
>>>>>> your issue.
>>>>>>
>>>>>> After recovering the machine I created 3 pools on 3 different disks
>>>>>> each
>>>>>> running a different delete_method.
>>>>>>
>>>>>> We then re-ran the tests which resulted in the pool running with
>>>>>> delete_method
>>>>>> WS16 being so broken it had suspended IO. A reboot resulted in it once
>>>>>> again
>>>>>> reporting no partition table via gpart.
>>>>>>
>>>>>> A third test run again produced a corrupt pool for WS16.
>>>>>>
>>>>>> I've conducted a preliminary review of the CAM WS16 code path along
>>>>>> with
>>>>>> SBC-3
>>>>>> spec which didn't identify any obvious issues.
>>>>>>
>>>>>> Given we're both using LSI 2008 based controllers it could be FW issue
>>>>>> specific
>>>>>> to WS16 but that's just speculation atm, so I'll continue to
>>>>>> investigate.
>>>>>>
>>>>>> If you could re-test you end without using WS16 to see if you can
>>>>>> reproduce the
>>>>>> problem with either UNMAP or ATA_TRIM that would be a very useful data
>>>>>> point.
>>>>>>
>>>>>>
>>>>>>  After much playing I narrow down a test case of one delete which was
>>>>> causing
>>>>> disc corruption for us (deleted the partition table instead of data in
>>>>> the middle of the disk).
>>>>>
>>>>> The conclusion is LSI 2008 HBA with FW below P13 will eat the data on
>>>>> your
>>>>> SATA
>>>>> disks if you use WS16 due to the following bug:-
>>>>> SCGCQ00230159 (DFCT) - Write same command to a SATA drive that doesn't
>>>>> support
>>>>> SCT write same may write wrong region.
>>>>>
>>>>> After updating here to P16, which we would generally be running, but
>>>>> test
>>>>> box
>>>>> was new and hadnt updated yet the corruption issue is no longer
>>>>> reproducable.
>>>>>
>>>>> So Ajit please check your FW version, I'm hoping to here your on
>>>>> something
>>>>> below P13, P12 possibly?
>>>>>
>>>>> If so then this is your issue, to fix simply update to P16 and the
>>>>> problem
>>>>> should be gone.
>>>>>
>>>>>
>>>>>    Regards
>>>>>    Steve
>>>>>
>>>>>
>>>>> ==============================******==================
>>>>>
>>>>>
>>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and
>>>>> the person or entity to whom it is addressed. In the event of
>>>>> misdirection,
>>>>> the recipient is prohibited from using, copying, printing or otherwise
>>>>> disseminating it or any information contained in it.
>>>>> In the event of misdirection, illegible or incomplete transmission
>>>>> please
>>>>> telephone +44 845 868 1337
>>>>> or return the E.mail to postmaster at multiplay.co.uk.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>  ==============================****==================
>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and
>>> the person or entity to whom it is addressed. In the event of
>>> misdirection,
>>> the recipient is prohibited from using, copying, printing or otherwise
>>> disseminating it or any information contained in it.
>>> In the event of misdirection, illegible or incomplete transmission please
>>> telephone +44 845 868 1337
>>> or return the E.mail to postmaster at multiplay.co.uk.
>>>
>>>
>>
> ==============================**==================
> This e.mail is private and confidential between Multiplay (UK) Ltd. and
> the person or entity to whom it is addressed. In the event of misdirection,
> the recipient is prohibited from using, copying, printing or otherwise
> disseminating it or any information contained in it.
> In the event of misdirection, illegible or incomplete transmission please
> telephone +44 845 868 1337
> or return the E.mail to postmaster at multiplay.co.uk.
>
>


More information about the freebsd-fs mailing list