Heap overflow in mps(4) (was: Re: stable/9 mps(4) rev 254938 == BOOM!)

Wed Jan 29 22:16:45 UTC 2014

On Wed, Jan 29, 2014 at 16:37:13 -0500, wollman at csail.mit.edu wrote:
> In article <52E94FC2.1010901 at bitfrost.no>, hps at bitfrost.no writes:
> >To me this sounds like someone is writing outside their assigned area.
> >
> >options 	DEBUG_REDZONE
> 
> hselasky@ nails it!  The mps(4) changes in stable/9 r254938 reliably
> cause a GPF during boot in non-debugging kernels, but adding
> DEBUG_REDZONE is sufficient to prevent the fault.  Whichever heap
> allocation is being overrun does *not* ever get freed: there are no
> redzone messages on the console.  (It also boots much faster with the
> new probing code, which is certainly a plus for debugging.)
> 
> I can confirm that the tip of stable/9 (r261256) also works with
> DEBUG_REDZONE and fails without it.  Only trouble is that I need to do
> performance testing, which DEBUG_REDZONE is not exactly going to help
> with.

Hmm.  What does vmstat -m show for the mps malloc bucket?

Are you booting off of the controller?  If not, could you try building mps
as a module and unloading it?  Perhaps the memory would get freed when the
module is unloaded and the redzone code would show where the problem is.

How many drives do you have in the system, and how many of them are SAS vs.
SATA?

I haven't seen this problem, but it may be that we've gotten lucky or don't
have the particular set of factors that you have.

We have tested with more than 200 drives connected, but they were all SAS.

I'll take a look and see if I can see anything that looks suspicious.

Ken
-- 
Kenneth Merry
ken at FreeBSD.ORG