seeing data corruption with zfs trim functionality

Steven Hartland killing at multiplay.co.uk
Thu May 30 22:42:43 UTC 2013


Tar archive of /usr/src and /usr/obj with built world and GENERIC kernel
for ams64 can be found here:-
http://blog.multiplay.co.uk/dropzone/freebsd/stable-9-r251096.tar.gz

This is based off r251096 with current proposed MFC of CAM BIO_DELETE &
ZFS TRIM.

    Regards
    Steve
----- Original Message ----- 
From: "Ajit Jain" <ajit.jain at cloudbyte.com>


> Hi Steven,
> 
> That would be really great. I'll install build provided by you and can
> quickly
> update the result. I am kind of feeling that I am asking too much of fever
> from you.
> 
> thanks for the help and bearing me,
> ajit
> 
> 
> On Wed, May 29, 2013 at 6:39 PM, Steven Hartland <killing at multiplay.co.uk>wrote:
> 
>> Unfortunately FS corruption is a serious matters so even though I'm 99.99%
>> convinced there isn't a problem I'd still prefer to confirm this was indeed
>> an issue with your code base and not an issue with the current code prior
>> to MFC'ing.
>>
>> Would a pre-patched stable/9 source / build help. If so I can look at
>> making
>> that available for you.
>>
>>
>>    Regards
>>    Steve
>>
>> ----- Original Message ----- From: "Ajit Jain" <ajit.jain at cloudbyte.com>
>>
>>
>>  Hi Steven,
>>>
>>> Sorry for the long delay, but might delay even further.
>>> I think the reason for the corruption was, my code
>>> was not updated specially cam directory.
>>>
>>> I request please do not stop just because of the issue I reported.
>>> I'll update my src tree and rerun the experiments I was running
>>> if I see some issue then probably we fix the bug rather then stopping
>>> for MFC.
>>>
>>> thanks,
>>> ajit
>>>
>>>
>>>
>>> On Wed, May 29, 2013 at 5:19 PM, Steven Hartland <killing at multiplay.co.uk
>>> >**wrote:
>>>
>>>  Sorry to pester, but any update on this Ajit?
>>>>
>>>> I ask as its currently blocking the MFC of TRIM to stable/8 & 9 and I've
>>>> been
>>>> unable to reproduce this issue even with your testing code on working FW
>>>> versions.
>>>>
>>>>
>>>>    Regards
>>>>    Steve
>>>>
>>>> ----- Original Message ----- From: "Ajit Jain" <ajit.jain at cloudbyte.com>
>>>>
>>>>
>>>>  Sure Steven,
>>>>
>>>>> I'll apply the patches and update ASAP.
>>>>>
>>>>> thanks
>>>>> ajit
>>>>>
>>>>>
>>>>> On Thu, May 23, 2013 at 3:03 PM, Steven Hartland <
>>>>> killing at multiplay.co.uk
>>>>> >**wrote:
>>>>>
>>>>>
>>>>>  I've attacked the two patch sets I'm looking to MFC to stable-9, one
>>>>>
>>>>>> adds BIO_DELETE CAM changes and the other is ZFS TRIM support.
>>>>>>
>>>>>> They should both apply cleanly to stable-9, if you could test with
>>>>>> those on your machine and let me know.
>>>>>>
>>>>>>    Regards
>>>>>>    Steve
>>>>>>
>>>>>> ----- Original Message ----- From: "Ajit Jain" <
>>>>>> ajit.jain at cloudbyte.com>
>>>>>>
>>>>>>
>>>>>>  Hi Steven,
>>>>>>
>>>>>>
>>>>>>> FW version on the setup is P15.
>>>>>>> I will upgrade the FW to P16, but I think my
>>>>>>> best bet will be to update code base to 9 stable as unlike you,
>>>>>>> I was seeing corruption for all three delete methods.
>>>>>>>
>>>>>>> thanks
>>>>>>> ajit
>>>>>>>
>>>>>>> On Sat, May 18, 2013 at 4:15 AM, Steven Hartland <
>>>>>>> killing at multiplay.co.uk
>>>>>>> >**wrote:
>>>>>>>
>>>>>>>
>>>>>>>  ----- Original Message ----- From: "Steven Hartland" <
>>>>>>>
>>>>>>>  killing at multiplay.co.uk>
>>>>>>>>
>>>>>>>>
>>>>>>>>  After initially seeing not issues, our overnight monitoring started
>>>>>>>>
>>>>>>>>  moaning
>>>>>>>>> big time on the test box. So we checked and there was zpool
>>>>>>>>> corruption
>>>>>>>>> as
>>>>>>>>> well
>>>>>>>>> as a missing boot loader and a corrupt GPT, so I believe we have
>>>>>>>>> reproduced
>>>>>>>>> your issue.
>>>>>>>>>
>>>>>>>>> After recovering the machine I created 3 pools on 3 different disks
>>>>>>>>> each
>>>>>>>>> running a different delete_method.
>>>>>>>>>
>>>>>>>>> We then re-ran the tests which resulted in the pool running with
>>>>>>>>> delete_method
>>>>>>>>> WS16 being so broken it had suspended IO. A reboot resulted in it
>>>>>>>>> once
>>>>>>>>> again
>>>>>>>>> reporting no partition table via gpart.
>>>>>>>>>
>>>>>>>>> A third test run again produced a corrupt pool for WS16.
>>>>>>>>>
>>>>>>>>> I've conducted a preliminary review of the CAM WS16 code path along
>>>>>>>>> with
>>>>>>>>> SBC-3
>>>>>>>>> spec which didn't identify any obvious issues.
>>>>>>>>>
>>>>>>>>> Given we're both using LSI 2008 based controllers it could be FW
>>>>>>>>> issue
>>>>>>>>> specific
>>>>>>>>> to WS16 but that's just speculation atm, so I'll continue to
>>>>>>>>> investigate.
>>>>>>>>>
>>>>>>>>> If you could re-test you end without using WS16 to see if you can
>>>>>>>>> reproduce the
>>>>>>>>> problem with either UNMAP or ATA_TRIM that would be a very useful
>>>>>>>>> data
>>>>>>>>> point.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  After much playing I narrow down a test case of one delete which
>>>>>>>>> was
>>>>>>>>>
>>>>>>>> causing
>>>>>>>> disc corruption for us (deleted the partition table instead of data
>>>>>>>> in
>>>>>>>> the middle of the disk).
>>>>>>>>
>>>>>>>> The conclusion is LSI 2008 HBA with FW below P13 will eat the data on
>>>>>>>> your
>>>>>>>> SATA
>>>>>>>> disks if you use WS16 due to the following bug:-
>>>>>>>> SCGCQ00230159 (DFCT) - Write same command to a SATA drive that
>>>>>>>> doesn't
>>>>>>>> support
>>>>>>>> SCT write same may write wrong region.
>>>>>>>>
>>>>>>>> After updating here to P16, which we would generally be running, but
>>>>>>>> test
>>>>>>>> box
>>>>>>>> was new and hadnt updated yet the corruption issue is no longer
>>>>>>>> reproducable.
>>>>>>>>
>>>>>>>> So Ajit please check your FW version, I'm hoping to here your on
>>>>>>>> something
>>>>>>>> below P13, P12 possibly?
>>>>>>>>
>>>>>>>> If so then this is your issue, to fix simply update to P16 and the
>>>>>>>> problem
>>>>>>>> should be gone.
>>>>>>>>
>>>>>>>>
>>>>>>>>    Regards
>>>>>>>>    Steve
>>>>>>>>
>>>>>>>>
>>>>>>>> ==============================********==================
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This e.mail is private and confidential between Multiplay (UK) Ltd.
>>>>>>>> and
>>>>>>>> the person or entity to whom it is addressed. In the event of
>>>>>>>> misdirection,
>>>>>>>> the recipient is prohibited from using, copying, printing or
>>>>>>>> otherwise
>>>>>>>> disseminating it or any information contained in it.
>>>>>>>> In the event of misdirection, illegible or incomplete transmission
>>>>>>>> please
>>>>>>>> telephone +44 845 868 1337
>>>>>>>> or return the E.mail to postmaster at multiplay.co.uk.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   ==============================******==================
>>>>>>>
>>>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and
>>>>>> the person or entity to whom it is addressed. In the event of
>>>>>> misdirection,
>>>>>> the recipient is prohibited from using, copying, printing or otherwise
>>>>>> disseminating it or any information contained in it.
>>>>>> In the event of misdirection, illegible or incomplete transmission
>>>>>> please
>>>>>> telephone +44 845 868 1337
>>>>>> or return the E.mail to postmaster at multiplay.co.uk.
>>>>>>
>>>>>>
>>>>>>
>>>>>  ==============================****==================
>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and
>>>> the person or entity to whom it is addressed. In the event of
>>>> misdirection,
>>>> the recipient is prohibited from using, copying, printing or otherwise
>>>> disseminating it or any information contained in it.
>>>> In the event of misdirection, illegible or incomplete transmission please
>>>> telephone +44 845 868 1337
>>>> or return the E.mail to postmaster at multiplay.co.uk.
>>>>
>>>>
>>>>
>>>
>> ==============================**==================
>> This e.mail is private and confidential between Multiplay (UK) Ltd. and
>> the person or entity to whom it is addressed. In the event of misdirection,
>> the recipient is prohibited from using, copying, printing or otherwise
>> disseminating it or any information contained in it.
>> In the event of misdirection, illegible or incomplete transmission please
>> telephone +44 845 868 1337
>> or return the E.mail to postmaster at multiplay.co.uk.
>>
>>
>

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.



More information about the freebsd-fs mailing list