vm_page_array and VM_PHYSSEG_SPARSE

Svatopluk Kraus onwahe at gmail.com
Wed Oct 8 15:38:07 UTC 2014


On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc at rice.edu> wrote:

>   On 09/27/2014 03:51, Svatopluk Kraus wrote:
>
>
> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox at gmail.com> wrote:
>
>>
>>
>>  On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus <onwahe at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem
>>> we've
>>> dealt with somehow, but now we would like to do it better. It's about
>>> physical pages which are allocated before vm subsystem is initialized.
>>> While later on these pages could be found in vm_page_array when
>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>>> VM_PHYSSEG_SPARSE
>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>>>
>>> It really would be nice to utilize vm_page_array for such preallocated
>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. Things
>>> could be much easier then. In our case, it's about pages which are used
>>> for
>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of such
>>> pages. First ones are preallocated and second ones are allocated after vm
>>> subsystem was inited. We must deal with each set differently. So code is
>>> more complex and so is debugging.
>>>
>>> Thus we need some method how to say that some part of physical memory
>>> should be included in vm_page_array, but the pages from that region
>>> should
>>> not be put to free list during initialization. We think that such
>>> possibility could be utilized in general. There could be a need for some
>>> physical space which:
>>>
>>> (1) is needed only during boot and later on it can be freed and put to vm
>>> subsystem,
>>>
>>> (2) is needed for something else and vm_page_array code could be used
>>> without some kind of its duplication.
>>>
>>> There is already some code which deals with blacklisted pages in
>>> vm_page.c
>>> file. So the easiest way how to deal with presented situation is to add
>>> some callback to this part of code which will be able to either exclude
>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the
>>> biggest
>>> phys_avail region is used for vm subsystem allocations, there should be
>>> some more coding. (However, blacklisted pages are not dealt with on that
>>> part of region.)
>>>
>>> We would like to know if there is any objection:
>>>
>>> (1) to deal with presented problem,
>>> (2) to deal with the problem presented way.
>>> Some help is very appreciated. Thanks
>>>
>>>
>>
>> As an experiment, try modifying vm_phys.c to use dump_avail instead of
>> phys_avail when sizing vm_page_array.  On amd64, where the same problem
>> exists, this allowed me to use VM_PHYSSEG_SPARSE.  Right now, this is
>> probably my preferred solution.  The catch being that not all architectures
>> implement dump_avail, but my recollection is that arm does.
>>
>
> Frankly, I would prefer this too, but there is one big open question:
>
> What is dump_avail for?
>
>
>
> dump_avail[] is solving a similar problem in the minidump code, hence, the
> prefix "dump_" in its name.  In other words, the minidump code couldn't use
> phys_avail[] either because it didn't describe the full range of physical
> addresses that might be included in a minidump, so dump_avail[] was created.
>
> There is already precedent for what I'm suggesting.  dump_avail[] is
> already (ab)used outside of the minidump code on x86 to solve this same
> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>
>
>  Using it for vm_page_array initialization and segmentation means that
> phys_avail must be a subset of it. And this must be stated and be visible
> enough. Maybe it should be even checked in code. I like the idea of
> thinking about dump_avail as something what desribes all memory in a
> system, but it's not how dump_avail is defined in archs now.
>
>
>
> When you say "it's not how dump_avail is defined in archs now", I'm not
> sure whether you're talking about the code or the comments.  In terms of
> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of any
> code that would have to change.  In terms of comments, I did a grep looking
> for comments defining what dump_avail[] is, because I couldn't remember
> any.  I found one ... on arm.  So, I don't think it's a onerous task
> changing the definition of dump_avail[].  :-)
>
> Already, as things stand today with dump_avail[] being used outside of the
> minidump code, one could reasonably argue that it should be renamed to
> something like phys_exists[].
>
>
>
> I will experiment with it on monday then. However, it's not only about how
> memory segments are created in vm_phys.c, but it's about how vm_page_array
> size is computed in vm_page.c too.
>
>
>
> Yes, and there is also a place in vm_reserv.c that needs to change.   I've
> attached the patch that I developed and tested a long time ago.  It still
> applies cleanly and runs ok on amd64.
>
>
>



Well, I've created and tested minimalistic patch which - I hope - is
commitable. It runs ok on pandaboard (arm-v6) and solves presented problem.
I would really appreciate if this will be commited. Thanks.


BTW, while I was inspecting all archs, I think that maybe it's time to do
what was done for busdma not long ago. There are many similar codes across
archs which deal with physical memory and could be generalized and put to
kern/subr_physmem.c for utilization. All work with physical memory could be
simplify to two arrays of regions.

phys_present[] ... describes all present physical memory regions
phys_exclude[] ... describes various exclusions from phys_present[]

Each excluded region will be labeled by flags to say what kind of exclusion
it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW  could
be combined. This idea is taken from sys/arm/arm/physmem.c.

All other arrays like phys_managed[], phys_avail[], dump_avail[] will be
created from these phys_present[] and phys_exclude[].
This way bootstrap codes in archs could be simplified and unified. For
example, dealing with either hw.physmem or page with PA 0x00000000 could be
transparent.

I'm prepared to volunteer if the thing is ripe. However, some tutor will be
looked for.

Svata
-------------- next part --------------
A non-text attachment was scrubbed...
Name: phys_managed.patch
Type: application/octet-stream
Size: 6126 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20141008/ef4a4c3e/attachment.obj>


More information about the freebsd-arch mailing list