Panic: attempted pmap_enter on 2MB page

Fri Oct 8 17:26:29 UTC 2010

Apologies for late response, wanted to check the code again.

On 10/07/2010 10:03 AM, Alan Cox wrote:
> Alan Cox wrote:
> At a high-level, I agree with much of what you say.  In particular, if
> pmap_enter() is applied to a virtual address that is already mapped by a large
> page, the reported panic could result.  However, barring bugs, for example, in
> memory allocation by the upper levels of the kernel, the panic inducing
> situation shouldn't occur.
Calls to malloc() of items larger than a page takes a turn through UMA and
eventually ends up in kmem_malloc() via its page_alloc() routine.
Kmem_malloc() in turn gets the pages from vm_page_alloc(), parks them in
the kmem_object and maps them into the kernel_pmap in a loop callings
pmap_enter() for each page. The assigned VA's are pulled from kmem_map.
Pages acquired through vm_page_alloc() may be backed by a super page
reservation and thus are eligible for auto-promotion.

Calls to free() initially take a similar route, ending up on kmem_free()
via UMAs page_free() routine. From there the call path is vm_map_remove(),
vm_map_remove(), vm_map_delete() to pmap_remove().

This logic indicate, that from the kernel/vm perspective the malloc/free()
pair will map/unmap pages as needed. However, the pmapper never unmaps
these pages as far as I can tell. The call path is pmap_remove(),
pmap_remove_pte() to pmap_unuse_pt() who ignores the removal because the
VA >= VM_MAXUSER_ADDRESS.

Without superpages this works fine because pmap_enter() allow replacement
of already mapped pages, but it kasserts on replacing a page that falls
within a super page.

I showed one way for this to happen in my first post. The problem is the
safety checks for promotion are tricked into promoting before all pages
of an allocation are pmap_entered().

> At a lower-level, it appears that you are misinterpreting what pmap_unuse_pt()
> does.  It is not pmap_unuse_pt()'s responsibility to clear page table entries,
> either for user-space page tables or the kernel page table.  When a region of
> the kernel virtual address space is deallocated, we do clear the kernel page
> table entries.  See, for example, pmap_remove_pte() (or pmap_qremove()).
I probably messed up the terminology in my post, mixing kernel_map and
kmem_map. Also I'm ignoring for the moment where the page table pages come
from, just observing that for VAs above user space, the PTE entries are not
cleared by calling pmap_remove(). pmap_qremove() does, but it's not used by
free().

> This special handling of the kernel page table does create another special
> case when we destroy a large page mapping within the kernel address space.  We
> have to reinsert into the kernel page table the old page table page (for small
> page mappings) that was sitting idle while the kernel page table held a large
> page mapping in its place.  At the same time, all of the page table entries in
> this page must be invalidated.  This is handled by pmap_remove_pde().
I never understood why invalidation is required following promotion/demotions.
The mapping does not change, attributes are the same, so unless the hardware
can't cope with co-existing 4K and 2M TLB entries then invalidation seems
unnecessary (A&D bits may need to be handled, but that might be quicker than
sending IPIs to invalidate).

> I'm curious to know more about you were doing when you encountered this
> panic.  Were you also using ISO images containing large MFS roots?
I saw it during sysctls I wrote to dump certain large kernel structures.
The MO was to allocate really big formatting buffer using malloc().
Truth be said, I did not see this on 2MB super pages, mine were much smaller
and therefore more prone to this.

Sincerely,
Kurt A