ZFS committed to the FreeBSD base.

Wed May 2 14:53:51 UTC 2007

On Tue, 1 May 2007, Rick Macklem wrote:

> On Tue, 1 May 2007, Kris Kennaway wrote:
>
>>> I don't know if it relevent, but I've seen "kmem_map: too small" panics 
>>> when testing my NFSv4 server, ever since about FreeBSD5.4. There is no 
>>> problem running the same server code on FreeBSD4 (which is what I still 
>>> run in production mode) or OpenBSD3 or 4. If I increase the size of the 
>>> map, I can delay the panic for up to about two weeks of hard testing, but 
>>> it never goes away. I don't see any evidence of a memory leak during the 
>>> several days of testing leading up to the panic. (NFSv4 uses MALLOC/FREE 
>>> extensively for state related structures.)
>> 
>> Sounds exactly like a memory leak to me.  How did you rule it out?
> Well, I had a little program running on the server that grabbed the 
> mti_stats[] out of the kernel and logged them. I had one client mounted 
> running thousands of passes of the Connectathon basic tests (one client, 
> same activity over and over and over again). For a week, the stats don't 
> show any increase in allocation for any type (alloc - free doesn't get 
> unreasonably big), then..."panic: kmem_map too small". How many days it took 
> to happen would vary with the size of the kernel map, but no evidence of a 
> leak prior to the crash. It seemed to be based on the number of times MALLOC 
> and FREE were called.
>
> Also, the same server code (except for the port changes, which have nothing 
> to do with the state handling where MALLOC/FREE get called a lot), works 
> fine for months on FreeBSD4 and OpenBSD3.9.
>
> So, I won't say a "memory leak is ruled out", but if there was a leak why 
> wouldn't it bite FreeBSD4 or show up in mti_stats[]?
>
> I first saw it on FreeBSD6.0, but went back to FreeBSD5.4 and tried the same 
> test and got the same result.

Historically, such panics have been a result of one of two things:

(1) An immediate resource leak in UMA(9) or malloc(9) allocated memory.

(2) Mis-tuning of a resource limit, perhaps due to sizing the limit based on
     solely physical memory size, not taking available kernel address space
     into account.

mti_stats reports only on malloc(9), you need to also look at uma(9), since 
many frequently allocated types are allocated directly with the slab 
allocator, and not from kernel malloc.  Take a look at the output of "show 
uma" or "show malloc" in DDB, or respectively "vmstat -z" and "vmstat -m" on a 
core or on a live system.  malloc(9) is actually implemented using two 
different back-ends: UMA-managed fixed size memory buckets for small 
allocations, and direct page allocation for large allocations.

The most frequent example of (2) is mis-tuning in the maximum vnode limit of 
the system, resulting in the vnode cache exceeding available address space. 
Try tuning down that limit.  Notice that vnodes, inodes, and most frequently 
used file system allocation data types are allocated using uma(9) and not 
malloc(9).

Robert N M Watson
Computer Laboratory
University of Cambridge