Extreme contigmalloc() slowness with mpt driver

Sat Jan 14 10:52:53 PST 2006

On Sat, Jan 14, 2006 at 08:28:11AM -0700, Scott Long wrote:
> Steve Kargl wrote:
> >On Sat, Jan 14, 2006 at 12:21:17AM -0500, Kris Kennaway wrote:
> >
> >>I have an amd64 machine with 16GB of RAM that takes ages to boot (~40
> >>minutes on 7.0).  This is because the mpt driver takes 20 minutes to
> >>attach (with 2 instances).  This in turn is because the following code
> >>from dev/mpt/mpt_pci.c:mpt_dma_mem_alloc() takes about 5 seconds to
> >>execute, and it is run 256 times in a loop:
> >>
> >>               error = bus_dmamap_create(mpt->buffer_dmat, 0, &req->dmap);
> >>
> >>When I set vm.old_contigmalloc=1, the system boots without delay.
> >>
> >>This points to a bug in contigmalloc.
> >>
> >
> >This is probably related to my recent reports of extremely
> >slow probing of fxp0.  I have 12 GB on a Tyan K8S Pro and
> >fxp0 takes on the order of 7 minutes to probe.
> >
> 
> Yep, that's the same reason.  THe issue here is that bus_dmamap_create
> is using contigmalloc to allocate bounce pages for the device.  At the
> request of Soeren, I recently upped the max limit on bounce pages from
> 512 to 4096.  Before that, drivers would quickly reach the max and then
> move on.  Now that the max is a lot higher, I guess it points to a
> scalability problem in the page search algorithm of contigmalloc.
> 

Thanks for the confirmation.  There may be more serious problems than
just long boot times.  I'm seeing recurring lock-ups on my system.  There
is no panic and no keyboard/network response from the system.  The system
sit in my office and acts as a very expense heater.

In my attempts to diagnosis the problems, I've cleaned out all installed
ports, all old shared libraries, all old bin/, sbin/, usr/bin, and
usr/sbin binaries.  Then, I rebuilt kernel and world and booted a fairly
clean system.  I have INVARIANTS/WITNESS/DDB in my kernelr.  I rebuilt
multiple ports from multiple vty terms without a problem.  Fired up X11,
opened several xterms and built more ports.  All seems fine.

Then, I rebuilt my Monte Carlo simulation code.  This program will
fork 2 children.  Each child will allocate up to 1 GB of memory 
where there are a few 250 MB arrays.  Each child runs for 7 minutes,
writes a few files, then exits.  The parent waits on the children,
and then forks 2 more children.  The parent should run for 24 to 96
hours for a complete simulation.  At some point the system will lock
up.  This is while X11 is running.  I haven't had a lock up at a 
vty term.

Note, I've used memtest86+ to check the memory and I've used Adaptec's
low level verfication to check my hard drives.

Anyhow, I'll keep hunting for the root of the problem.

-- 
Steve