Repeated similar panics on -STABLE

Sat Apr 19 14:56:22 PDT 2003

On 19 Apr, Terry Lambert wrote:
> Don Lewis wrote:
>> Ok, here's a Reader's Digest version of the code in question:
>> 
>>         if (kbp->kb_next == NULL) {
>>                 kbp->kb_last = NULL;
>>                 if (size > MAXALLOCSAVE)
>>                         allocsize = roundup(size, PAGE_SIZE);
>>                 else
>>                         allocsize = 1 << indx;
>>                 npg = btoc(allocsize);
>>                 va = (caddr_t) kmem_malloc(kmem_map, (vm_size_t)ctob(npg), flags
>> );
>>                 /*
>>                  * Just in case we blocked while allocating memory,
>>                  * and someone else also allocated memory for this
>>                  * bucket, don't assume the list is still empty.
>>                  */
>>                 savedlist = kbp->kb_next;
>>                 kbp->kb_next = cp = va + (npg * PAGE_SIZE) - allocsize;
>>                 for (;;) {
>>                         freep = (struct freelist *)cp;
>>                         if (cp <= va)
>>                                 break;
>>                         cp -= allocsize;
>>                         freep->next = cp;
>>                 }
>>                 freep->next = savedlist;
> 
> Take an interrupt somewhere around here, and have the available
> entries removed from the freelist by an interrupt level driver.
> 
> Or take a page fault, and have the same thing happen with
> page-related metadata coming from the freelist in question.

How can an interrupt or another process touch the freelist while we're
protected by splmem()?  If that were possible, the block could be stolen
out from under us in the code below between the assignment to va and the
update of kbb->kb_next, allocating the same block of memory to two
different consumers.

>>                 if (kbp->kb_last == NULL)
>>                         kbp->kb_last = (caddr_t)freep;
>>         }
>>         va = kbp->kb_next;
>>         kbp->kb_next = ((struct freelist *)va)->next;
> 
> [ ... ]
> 
>> This code bothers me, though, because if allocsize ever happened to not
>> evenly divide (npg * PAGE_SIZE), the loop termination condition would be
>> wrong and the end of the previous page (or more) would end up on the
>> free list.  If va were small enough and allocsize were big enough, it
>> would even be possible to wrap around.
> 
> That actually can't happen.  Basically, the allocators "waste"
> the ends of pages.  See the:
> 
>                         if (cp <= va)
>                                 break;
>                         cp -= allocsize;
> 
> ?  The "<= saves you.

It only works because allocsize evenly divides npg*PAGE_SIZE.  If there
was a heavy consumer of 129 byte blocks, someone might get the bright
idea to allocate a special bucket for them because a lot more 129 byte
blocks fit in a page than 256 byte blocks.  As the "for" loop iterated,
we'd get to the point where va < cp < va + allocsize.  The "<=" test
would pass, we'd decrement cp, causing it to be less than va, do the
	freep->next = cp;
assignment, return to the top of the "for" loop, do the
	freep = (struct freelist *)cp;
assignment, so that freep now points outside the block of memory
allocated by kmem_malloc().  Now the "<=" test will kick us out of the
loop and we'll do the
	freep->next = savedlist;
assignment and stomping on someone else's memory.  A safer, but slightly
more expensive test would be
	cp < va + allocsize