[PATCH] Stability fixes for IPS driver for 4.x
Scott Long
scottl at samsco.org
Tue Apr 12 21:24:29 PDT 2005
David Sze wrote:
> At 12:26 PM 12/04/2005 -0600, Scott Long wrote this to All:
>
>> David Sze wrote:
>>
>>> At 11:31 PM 10/04/2005 -0600, Scott Long wrote this to All:
>>>
>>>> Making a driver PAE-ified means either teaching it to do 64-bit
>>>> scatter-gather (assuming that the peripheral hardware can do this
>>>> and that it's documented), or teaching the driver to correctly handle
>>>> EINPROGRESS from bus_dmamap_load() along with using the proper busdma
>>>> tag limits. The strategy I took with 6.x/5.x was the second one since
>>>> I didn't have good IPS docs in front of me and I wanted it follow the
>>>> APIs correctly. I did test it with 8GB of memory and it performed
>>>> correctly under load. I haven't taken a close enough look at your
>>>> MFC patch to say for sure if it's correct or not. I'm not sure if
>>>> I'll have time to take another look in the next few days,
>>>> unfortunately.
>>>> Is there any chance you could test 5.x/6.0 under load with PAE just to
>>>> validate the assertion that it works correctly there?
>>>
>>>
>>> I had a chance to test 5.4-RC1 (i386) today with GENERIC, SMP, PAE,
>>> and SMP-PAE kernels (the last one is just PAE with "options SMP").
>>> To recap, the hardware is an IBM xSeries 346, Dual Xeon 3GHz
>>> (non-E64MT), ServeRAID-7K.
>>> GENERIC and SMP survived "make buildkernel", but PAE and SMP-PAE
>>> paniced reproducibly doing the same. The DDB stack trace doesn't
>>> appear to be anywhere near the IPS driver though, so I'm way out of
>>> my league.
>>
>>
>> Darnit, hard to say if this is an existing bug in 5.4 or if it's a
>> bug/corruption in ips.Can you re-run with PAE disabled?
>
>
> Works fine with PAE disabled (or at least I couldn't get it to panic),
> both UP and SMP kernels.
>
>
>> Would you be
>> willing to put the Giant lock back on top of the driver? This would
>> mean modifying the call to bus_intr_config(), adding the D_GIANTNEEDED
>> flag to the disk structure in disk_create(), and switching the mutex
>> argument in bus_dma_tag_create() for the sg_dmatag tag.
>
>
> I put Giant back in as you described (patch attached), but it still
> panic'ed with PAE enabled, both UP and SMP kernels. The stack trace was
> very similar; the fault address (0x24) and the top three stack frames
> were the same as without Giant:
>
> propagate_priority
> turnstile_wait
> _mtx_lock_sleep
>
> At this point I no longer have access to the hardware, the customer
> wanted his servers back. They're going into the datacenter with
> RELENG_4 (w/IPS stability patch), without PAE (so the top ~900MB of his
> 4GB RAM is lost to PCI-X address space).
>
>
Crumbs, I see a potential problem. I won't have time until this weekend
to sort it out, though. Sorry this has become such a drawn-out affair,
I hope that your customer isn't too upset.
Scott
More information about the freebsd-stable
mailing list