[PATCH] Stability fixes for IPS driver for 4.x

Scott Long scottl at samsco.org
Tue Apr 12 21:24:29 PDT 2005

David Sze wrote:
> At 12:26 PM 12/04/2005 -0600, Scott Long wrote this to All:
>> David Sze wrote:
>>> At 11:31 PM 10/04/2005 -0600, Scott Long wrote this to All:
>>>> Making a driver PAE-ified means either teaching it to do 64-bit
>>>> scatter-gather (assuming that the peripheral hardware can do this
>>>> and that it's documented), or teaching the driver to correctly handle
>>>> EINPROGRESS from bus_dmamap_load() along with using the proper busdma
>>>> tag limits.  The strategy I took with 6.x/5.x was the second one since
>>>> I didn't have good IPS docs in front of me and I wanted it follow the
>>>> APIs correctly.  I did test it with 8GB of memory and it performed
>>>> correctly under load.  I haven't taken a close enough look at your
>>>> MFC patch to say for sure if it's correct or not.  I'm not sure if
>>>> I'll have time to take another look in the next few days, 
>>>> unfortunately.
>>>> Is there any chance you could test 5.x/6.0 under load with PAE just to
>>>> validate the assertion that it works correctly there?
>>> I had a chance to test 5.4-RC1 (i386) today with GENERIC, SMP, PAE, 
>>> and SMP-PAE kernels (the last one is just PAE with "options SMP").
>>> To recap, the hardware is an IBM xSeries 346, Dual Xeon 3GHz 
>>> (non-E64MT), ServeRAID-7K.
>>> GENERIC and SMP survived "make buildkernel", but PAE and SMP-PAE 
>>> paniced reproducibly doing the same.  The DDB stack trace doesn't 
>>> appear to be anywhere near the IPS driver though, so I'm way out of 
>>> my league.
>> Darnit, hard to say if this is an existing bug in 5.4 or if it's a 
>> bug/corruption in ips.Can you re-run with PAE disabled?
> Works fine with PAE disabled (or at least I couldn't get it to panic), 
> both UP and SMP kernels.
>> Would you be
>> willing to put the Giant lock back on top of the driver?  This would
>> mean modifying the call to bus_intr_config(), adding the D_GIANTNEEDED
>> flag to the disk structure in disk_create(), and switching the mutex
>> argument in bus_dma_tag_create() for the sg_dmatag tag.
> I put Giant back in as you described (patch attached), but it still 
> panic'ed with PAE enabled, both UP and SMP kernels.  The stack trace was 
> very similar; the fault address (0x24) and the top three stack frames 
> were the same as without Giant:
>         propagate_priority
>         turnstile_wait
>         _mtx_lock_sleep
> At this point I no longer have access to the hardware, the customer 
> wanted his servers back.  They're going into the datacenter with 
> RELENG_4 (w/IPS stability patch), without PAE (so the top ~900MB of his 
> 4GB RAM is lost to PCI-X address space).

Crumbs, I see a potential problem.  I won't have time until this weekend
to sort it out, though.  Sorry this has become such a drawn-out affair,
I hope that your customer isn't too upset.


