svn commit: r208589 - head/sys/mips/mips

Tue Jun 8 21:31:14 UTC 2010

On Tue, Jun 8, 2010 at 12:03 PM, Alan Cox <alc at cs.rice.edu> wrote:
> C. Jayachandran wrote:
>>
>> On Tue, Jun 8, 2010 at 2:59 AM, Alan Cox <alc at cs.rice.edu> wrote:
>>
>>>
>>> On 6/7/2010 3:28 PM, Kostik Belousov wrote:
>>>
>>>>
>>>> Selecting a random message in the thread to ask my question.
>>>> Is the issue that page table pages should be allocated from the specific
>>>> physical region of the memory ? If yes, doesn't i386 PAE has similar
>>>> issue with page directory pointer table ? I see a KASSERT in i386
>>>> pmap that verifies that the allocated table is below 4G, but I do not
>>>> understand how uma ensures the constraint (I suspect that it does not).
>>>>
>>>>
>>>
>>> For i386 PAE, the UMA backend allocator uses kmem_alloc_contig() to
>>> ensure
>>> that the memory is below 4G.  The crucial difference between i386 PAE and
>>> MIPS is that for i386 PAE only the top-level table needs to be below a
>>> specific address threshold.  Moreover, this level is allocated in a
>>> place,
>>> pmap_pinit(), where we are allowed to sleep.
>>>
>>
>> Yes. I saw the PAE top level page table code and thought I could use
>> that mechanism for allocating MIPS page table pages in the direct
>> mapped memory. The other reference I used was
>> pmap_alloc_zeroed_contig_pages() function in sun4v/sun4v/pmap.c which
>> uses the vm_phys_alloc_contig() and VM_WAIT.
>
> That's unfortunate.  :-(  Since sun4v is essentially dead code, I've never
> spent much time thinking about its pmap implementation.  I'll mechanically
> apply changes to it, but that's about it.  I wouldn't recommend using it as
> a reference.
>
>> ...  I had also thought of
>> using the VM_FREEPOOL_DIRECT which seemed to be for a similar purpose,
>> but could find see any usage in the kernel.
>>
>>
>
> VM_FREEPOOL_DIRECT is used by at least amd64 and ia64 for page table pages
> and small kernel memory allocations.  Unlike mips, these machines don't have
> MMU support for a direct map.  Their direct maps are just a range of
> mappings in the regular (kernel) page table.  So, unlike mips, accesses
> through their direct map may still miss in the TLB and require a page table
> walk.  VM_FREEPOOL_* is a way to increase the physical locality (or
> clustering) of page allocations, so that, for example, page table page
> accesses by the pmap on amd64 are less likely to miss in the TLB.  However,
> it doesn't place a hard restriction on the range of physical addresses that
> will be used, which you need for mips.
>
> The impact of this clustering can be significant.  For example, on amd64 we
> use 2MB page mappings to implement the direct map.  However, old Opterons
> only had 8 data TLB entries for 2MB page mappings.  For a uniprocessor
> kernel running on such an Opteron, I measured an 18% reduction in system
> time during a buildworld with the introduction of VM_FREEPOOL_DIRECT.  (See
> the commit logs for vm/vm_phys.c and the comment that precedes the
> VM_NFREEORDER definition on amd64.)
>
> Until such time as superpage support is ported to mips from the amd64/i386
> pmaps, I don't think there is a point in having more than one VM_FREEPOOL_*
> on mips.  And then, the point would be to reduce fragmentation of the
> physical memory that could be caused by small allocations, such as page
> table pages.

Thanks for the detailed explanation.

Also, after looking at the code again,  I think vm_phys_alloc_contig()
can optimized not to look into segments which lie outside the area of
interest. The patch is:

Index: sys/vm/vm_phys.c
===================================================================

--- sys/vm/vm_phys.c    (revision 208890)
+++ sys/vm/vm_phys.c    (working copy)
@@ -595,7 +595,7 @@
        vm_object_t m_object;
        vm_paddr_t pa, pa_last, size;
        vm_page_t deferred_vdrop_list, m, m_ret;
-       int flind, i, oind, order, pind;
+       int segind, i, oind, order, pind;

        size = npages << PAGE_SHIFT;
        KASSERT(size != 0,
@@ -611,21 +611,20 @@
 #if VM_NRESERVLEVEL > 0
 retry:
 #endif
-       for (flind = 0; flind < vm_nfreelists; flind++) {
+       for (segind = 0; segind < vm_phys_nsegs; segind++) {
+               /*
+                * A free list may contain physical pages
+                * from one or more segments.
+                */
+               seg = &vm_phys_segs[segind];
+               if (seg->start > high || low >= seg->end)
+                       continue;
+
                for (oind = min(order, VM_NFREEORDER - 1); oind <
VM_NFREEORDER; oind++) {
                        for (pind = 0; pind < VM_NFREEPOOL; pind++) {
-                               fl = vm_phys_free_queues[flind][pind];
+                               fl = (*seg->free_queues)[pind];
                                TAILQ_FOREACH(m_ret, &fl[oind].pl, pageq) {
                                        /*
-                                        * A free list may contain
physical pages
-                                        * from one or more segments.
-                                        */
-                                       seg = &vm_phys_segs[m_ret->segind];
-                                       if (seg->start > high ||
-                                           low >= seg->end)
-                                               continue;
-
-                                       /*
                                         * Is the size of this
allocation request
                                         * larger than the largest block size?
                                         */


-----

This change, along with the vmparam.h changes for HIGHMEM, I think we
should be able to use  vm_phys_alloc_contig() for page table pages (or
have I again missed something fundamental?).

JC.