[PATCH] Stability fixes for IPS driver for 4.x

David Sze dsze at alumni.uwaterloo.ca
Tue Apr 12 20:57:04 PDT 2005


At 12:26 PM 12/04/2005 -0600, Scott Long wrote this to All:
>David Sze wrote:
>>At 11:31 PM 10/04/2005 -0600, Scott Long wrote this to All:
>>
>>>Making a driver PAE-ified means either teaching it to do 64-bit
>>>scatter-gather (assuming that the peripheral hardware can do this
>>>and that it's documented), or teaching the driver to correctly handle
>>>EINPROGRESS from bus_dmamap_load() along with using the proper busdma
>>>tag limits.  The strategy I took with 6.x/5.x was the second one since
>>>I didn't have good IPS docs in front of me and I wanted it follow the
>>>APIs correctly.  I did test it with 8GB of memory and it performed
>>>correctly under load.  I haven't taken a close enough look at your
>>>MFC patch to say for sure if it's correct or not.  I'm not sure if
>>>I'll have time to take another look in the next few days, unfortunately.
>>>Is there any chance you could test 5.x/6.0 under load with PAE just to
>>>validate the assertion that it works correctly there?
>>
>>I had a chance to test 5.4-RC1 (i386) today with GENERIC, SMP, PAE, and 
>>SMP-PAE kernels (the last one is just PAE with "options SMP").
>>To recap, the hardware is an IBM xSeries 346, Dual Xeon 3GHz (non-E64MT), 
>>ServeRAID-7K.
>>GENERIC and SMP survived "make buildkernel", but PAE and SMP-PAE paniced 
>>reproducibly doing the same.  The DDB stack trace doesn't appear to be 
>>anywhere near the IPS driver though, so I'm way out of my league.
>
>Darnit, hard to say if this is an existing bug in 5.4 or if it's a 
>bug/corruption in ips.Can you re-run with PAE disabled?

Works fine with PAE disabled (or at least I couldn't get it to panic), both 
UP and SMP kernels.


>Would you be
>willing to put the Giant lock back on top of the driver?  This would
>mean modifying the call to bus_intr_config(), adding the D_GIANTNEEDED
>flag to the disk structure in disk_create(), and switching the mutex
>argument in bus_dma_tag_create() for the sg_dmatag tag.

I put Giant back in as you described (patch attached), but it still 
panic'ed with PAE enabled, both UP and SMP kernels.  The stack trace was 
very similar; the fault address (0x24) and the top three stack frames were 
the same as without Giant:

         propagate_priority
         turnstile_wait
         _mtx_lock_sleep

At this point I no longer have access to the hardware, the customer wanted 
his servers back.  They're going into the datacenter with RELENG_4 (w/IPS 
stability patch), without PAE (so the top ~900MB of his 4GB RAM is lost to 
PCI-X address space).


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ips.RELENG_5_4.giant.patch
Type: application/octet-stream
Size: 4351 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20050412/9c83bda7/ips.RELENG_5_4.giant.obj


More information about the freebsd-stable mailing list