seeing data corruption with zfs trim functionality
Ajit Jain
ajit.jain at cloudbyte.com
Wed May 29 15:19:30 UTC 2013
Hi Steven,
That would be really great. I'll install build provided by you and can
quickly
update the result. I am kind of feeling that I am asking too much of fever
from you.
thanks for the help and bearing me,
ajit
On Wed, May 29, 2013 at 6:39 PM, Steven Hartland <killing at multiplay.co.uk>wrote:
> Unfortunately FS corruption is a serious matters so even though I'm 99.99%
> convinced there isn't a problem I'd still prefer to confirm this was indeed
> an issue with your code base and not an issue with the current code prior
> to MFC'ing.
>
> Would a pre-patched stable/9 source / build help. If so I can look at
> making
> that available for you.
>
>
> Regards
> Steve
>
> ----- Original Message ----- From: "Ajit Jain" <ajit.jain at cloudbyte.com>
>
>
> Hi Steven,
>>
>> Sorry for the long delay, but might delay even further.
>> I think the reason for the corruption was, my code
>> was not updated specially cam directory.
>>
>> I request please do not stop just because of the issue I reported.
>> I'll update my src tree and rerun the experiments I was running
>> if I see some issue then probably we fix the bug rather then stopping
>> for MFC.
>>
>> thanks,
>> ajit
>>
>>
>>
>> On Wed, May 29, 2013 at 5:19 PM, Steven Hartland <killing at multiplay.co.uk
>> >**wrote:
>>
>> Sorry to pester, but any update on this Ajit?
>>>
>>> I ask as its currently blocking the MFC of TRIM to stable/8 & 9 and I've
>>> been
>>> unable to reproduce this issue even with your testing code on working FW
>>> versions.
>>>
>>>
>>> Regards
>>> Steve
>>>
>>> ----- Original Message ----- From: "Ajit Jain" <ajit.jain at cloudbyte.com>
>>>
>>>
>>> Sure Steven,
>>>
>>>> I'll apply the patches and update ASAP.
>>>>
>>>> thanks
>>>> ajit
>>>>
>>>>
>>>> On Thu, May 23, 2013 at 3:03 PM, Steven Hartland <
>>>> killing at multiplay.co.uk
>>>> >**wrote:
>>>>
>>>>
>>>> I've attacked the two patch sets I'm looking to MFC to stable-9, one
>>>>
>>>>> adds BIO_DELETE CAM changes and the other is ZFS TRIM support.
>>>>>
>>>>> They should both apply cleanly to stable-9, if you could test with
>>>>> those on your machine and let me know.
>>>>>
>>>>> Regards
>>>>> Steve
>>>>>
>>>>> ----- Original Message ----- From: "Ajit Jain" <
>>>>> ajit.jain at cloudbyte.com>
>>>>>
>>>>>
>>>>> Hi Steven,
>>>>>
>>>>>
>>>>>> FW version on the setup is P15.
>>>>>> I will upgrade the FW to P16, but I think my
>>>>>> best bet will be to update code base to 9 stable as unlike you,
>>>>>> I was seeing corruption for all three delete methods.
>>>>>>
>>>>>> thanks
>>>>>> ajit
>>>>>>
>>>>>> On Sat, May 18, 2013 at 4:15 AM, Steven Hartland <
>>>>>> killing at multiplay.co.uk
>>>>>> >**wrote:
>>>>>>
>>>>>>
>>>>>> ----- Original Message ----- From: "Steven Hartland" <
>>>>>>
>>>>>> killing at multiplay.co.uk>
>>>>>>>
>>>>>>>
>>>>>>> After initially seeing not issues, our overnight monitoring started
>>>>>>>
>>>>>>> moaning
>>>>>>>> big time on the test box. So we checked and there was zpool
>>>>>>>> corruption
>>>>>>>> as
>>>>>>>> well
>>>>>>>> as a missing boot loader and a corrupt GPT, so I believe we have
>>>>>>>> reproduced
>>>>>>>> your issue.
>>>>>>>>
>>>>>>>> After recovering the machine I created 3 pools on 3 different disks
>>>>>>>> each
>>>>>>>> running a different delete_method.
>>>>>>>>
>>>>>>>> We then re-ran the tests which resulted in the pool running with
>>>>>>>> delete_method
>>>>>>>> WS16 being so broken it had suspended IO. A reboot resulted in it
>>>>>>>> once
>>>>>>>> again
>>>>>>>> reporting no partition table via gpart.
>>>>>>>>
>>>>>>>> A third test run again produced a corrupt pool for WS16.
>>>>>>>>
>>>>>>>> I've conducted a preliminary review of the CAM WS16 code path along
>>>>>>>> with
>>>>>>>> SBC-3
>>>>>>>> spec which didn't identify any obvious issues.
>>>>>>>>
>>>>>>>> Given we're both using LSI 2008 based controllers it could be FW
>>>>>>>> issue
>>>>>>>> specific
>>>>>>>> to WS16 but that's just speculation atm, so I'll continue to
>>>>>>>> investigate.
>>>>>>>>
>>>>>>>> If you could re-test you end without using WS16 to see if you can
>>>>>>>> reproduce the
>>>>>>>> problem with either UNMAP or ATA_TRIM that would be a very useful
>>>>>>>> data
>>>>>>>> point.
>>>>>>>>
>>>>>>>>
>>>>>>>> After much playing I narrow down a test case of one delete which
>>>>>>>> was
>>>>>>>>
>>>>>>> causing
>>>>>>> disc corruption for us (deleted the partition table instead of data
>>>>>>> in
>>>>>>> the middle of the disk).
>>>>>>>
>>>>>>> The conclusion is LSI 2008 HBA with FW below P13 will eat the data on
>>>>>>> your
>>>>>>> SATA
>>>>>>> disks if you use WS16 due to the following bug:-
>>>>>>> SCGCQ00230159 (DFCT) - Write same command to a SATA drive that
>>>>>>> doesn't
>>>>>>> support
>>>>>>> SCT write same may write wrong region.
>>>>>>>
>>>>>>> After updating here to P16, which we would generally be running, but
>>>>>>> test
>>>>>>> box
>>>>>>> was new and hadnt updated yet the corruption issue is no longer
>>>>>>> reproducable.
>>>>>>>
>>>>>>> So Ajit please check your FW version, I'm hoping to here your on
>>>>>>> something
>>>>>>> below P13, P12 possibly?
>>>>>>>
>>>>>>> If so then this is your issue, to fix simply update to P16 and the
>>>>>>> problem
>>>>>>> should be gone.
>>>>>>>
>>>>>>>
>>>>>>> Regards
>>>>>>> Steve
>>>>>>>
>>>>>>>
>>>>>>> ==============================********==================
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This e.mail is private and confidential between Multiplay (UK) Ltd.
>>>>>>> and
>>>>>>> the person or entity to whom it is addressed. In the event of
>>>>>>> misdirection,
>>>>>>> the recipient is prohibited from using, copying, printing or
>>>>>>> otherwise
>>>>>>> disseminating it or any information contained in it.
>>>>>>> In the event of misdirection, illegible or incomplete transmission
>>>>>>> please
>>>>>>> telephone +44 845 868 1337
>>>>>>> or return the E.mail to postmaster at multiplay.co.uk.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ==============================******==================
>>>>>>
>>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and
>>>>> the person or entity to whom it is addressed. In the event of
>>>>> misdirection,
>>>>> the recipient is prohibited from using, copying, printing or otherwise
>>>>> disseminating it or any information contained in it.
>>>>> In the event of misdirection, illegible or incomplete transmission
>>>>> please
>>>>> telephone +44 845 868 1337
>>>>> or return the E.mail to postmaster at multiplay.co.uk.
>>>>>
>>>>>
>>>>>
>>>> ==============================****==================
>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and
>>> the person or entity to whom it is addressed. In the event of
>>> misdirection,
>>> the recipient is prohibited from using, copying, printing or otherwise
>>> disseminating it or any information contained in it.
>>> In the event of misdirection, illegible or incomplete transmission please
>>> telephone +44 845 868 1337
>>> or return the E.mail to postmaster at multiplay.co.uk.
>>>
>>>
>>>
>>
> ==============================**==================
> This e.mail is private and confidential between Multiplay (UK) Ltd. and
> the person or entity to whom it is addressed. In the event of misdirection,
> the recipient is prohibited from using, copying, printing or otherwise
> disseminating it or any information contained in it.
> In the event of misdirection, illegible or incomplete transmission please
> telephone +44 845 868 1337
> or return the E.mail to postmaster at multiplay.co.uk.
>
>
More information about the freebsd-fs
mailing list