vm_page_array and VM_PHYSSEG_SPARSE
Alan Cox
alc at rice.edu
Mon Oct 27 16:31:54 UTC 2014
On 10/27/2014 08:22, Svatopluk Kraus wrote:
>
> On Mon, Oct 27, 2014 at 7:59 AM, Alan Cox <alc at rice.edu
> <mailto:alc at rice.edu>> wrote:
>
> On 10/24/2014 06:33, Svatopluk Kraus wrote:
>>
>> On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox <alc at rice.edu
>> <mailto:alc at rice.edu>> wrote:
>>
>> On 10/08/2014 10:38, Svatopluk Kraus wrote:
>> > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc at rice.edu
>> <mailto:alc at rice.edu>> wrote:
>> >
>> >> On 09/27/2014 03:51, Svatopluk Kraus wrote:
>> >>
>> >>
>> >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox
>> <alan.l.cox at gmail.com <mailto:alan.l.cox at gmail.com>> wrote:
>> >>
>> >>>
>> >>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus
>> <onwahe at gmail.com <mailto:onwahe at gmail.com>>
>> >>> wrote:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> I and Michal are finishing new ARM pmap-v6 code. There
>> is one problem
>> >>>> we've
>> >>>> dealt with somehow, but now we would like to do it
>> better. It's about
>> >>>> physical pages which are allocated before vm subsystem
>> is initialized.
>> >>>> While later on these pages could be found in
>> vm_page_array when
>> >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>> >>>> VM_PHYSSEG_SPARSE
>> >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>> >>>>
>> >>>> It really would be nice to utilize vm_page_array for
>> such preallocated
>> >>>> physical pages even when VM_PHYSSEG_SPARSE memory model
>> is used. Things
>> >>>> could be much easier then. In our case, it's about pages
>> which are used
>> >>>> for
>> >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have
>> two sets of such
>> >>>> pages. First ones are preallocated and second ones are
>> allocated after vm
>> >>>> subsystem was inited. We must deal with each set
>> differently. So code is
>> >>>> more complex and so is debugging.
>> >>>>
>> >>>> Thus we need some method how to say that some part of
>> physical memory
>> >>>> should be included in vm_page_array, but the pages from
>> that region
>> >>>> should
>> >>>> not be put to free list during initialization. We think
>> that such
>> >>>> possibility could be utilized in general. There could be
>> a need for some
>> >>>> physical space which:
>> >>>>
>> >>>> (1) is needed only during boot and later on it can be
>> freed and put to vm
>> >>>> subsystem,
>> >>>>
>> >>>> (2) is needed for something else and vm_page_array code
>> could be used
>> >>>> without some kind of its duplication.
>> >>>>
>> >>>> There is already some code which deals with blacklisted
>> pages in
>> >>>> vm_page.c
>> >>>> file. So the easiest way how to deal with presented
>> situation is to add
>> >>>> some callback to this part of code which will be able to
>> either exclude
>> >>>> whole phys_avail[i], phys_avail[i+1] region or single
>> pages. As the
>> >>>> biggest
>> >>>> phys_avail region is used for vm subsystem allocations,
>> there should be
>> >>>> some more coding. (However, blacklisted pages are not
>> dealt with on that
>> >>>> part of region.)
>> >>>>
>> >>>> We would like to know if there is any objection:
>> >>>>
>> >>>> (1) to deal with presented problem,
>> >>>> (2) to deal with the problem presented way.
>> >>>> Some help is very appreciated. Thanks
>> >>>>
>> >>>>
>> >>> As an experiment, try modifying vm_phys.c to use
>> dump_avail instead of
>> >>> phys_avail when sizing vm_page_array. On amd64, where
>> the same problem
>> >>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right
>> now, this is
>> >>> probably my preferred solution. The catch being that not
>> all architectures
>> >>> implement dump_avail, but my recollection is that arm does.
>> >>>
>> >> Frankly, I would prefer this too, but there is one big
>> open question:
>> >>
>> >> What is dump_avail for?
>> >>
>> >>
>> >>
>> >> dump_avail[] is solving a similar problem in the minidump
>> code, hence, the
>> >> prefix "dump_" in its name. In other words, the minidump
>> code couldn't use
>> >> phys_avail[] either because it didn't describe the full
>> range of physical
>> >> addresses that might be included in a minidump, so
>> dump_avail[] was created.
>> >>
>> >> There is already precedent for what I'm suggesting.
>> dump_avail[] is
>> >> already (ab)used outside of the minidump code on x86 to
>> solve this same
>> >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>> >>
>> >>
>> >> Using it for vm_page_array initialization and
>> segmentation means that
>> >> phys_avail must be a subset of it. And this must be stated
>> and be visible
>> >> enough. Maybe it should be even checked in code. I like
>> the idea of
>> >> thinking about dump_avail as something what desribes all
>> memory in a
>> >> system, but it's not how dump_avail is defined in archs now.
>> >>
>> >>
>> >>
>> >> When you say "it's not how dump_avail is defined in archs
>> now", I'm not
>> >> sure whether you're talking about the code or the
>> comments. In terms of
>> >> code, dump_avail[] is a superset of phys_avail[], and I'm
>> not aware of any
>> >> code that would have to change. In terms of comments, I
>> did a grep looking
>> >> for comments defining what dump_avail[] is, because I
>> couldn't remember
>> >> any. I found one ... on arm. So, I don't think it's a
>> onerous task
>> >> changing the definition of dump_avail[]. :-)
>> >>
>> >> Already, as things stand today with dump_avail[] being
>> used outside of the
>> >> minidump code, one could reasonably argue that it should
>> be renamed to
>> >> something like phys_exists[].
>> >>
>> >>
>> >>
>> >> I will experiment with it on monday then. However, it's
>> not only about how
>> >> memory segments are created in vm_phys.c, but it's about
>> how vm_page_array
>> >> size is computed in vm_page.c too.
>> >>
>> >>
>> >>
>> >> Yes, and there is also a place in vm_reserv.c that needs
>> to change. I've
>> >> attached the patch that I developed and tested a long time
>> ago. It still
>> >> applies cleanly and runs ok on amd64.
>> >>
>> >>
>> >>
>> >
>> >
>> > Well, I've created and tested minimalistic patch which - I
>> hope - is
>> > commitable. It runs ok on pandaboard (arm-v6) and solves
>> presented problem.
>> > I would really appreciate if this will be commited. Thanks.
>>
>>
>> Sorry for the slow reply. I've just been swamped with work
>> lately. I
>> finally had some time to look at this in the last day or so.
>>
>> The first thing that I propose to do is commit the attached
>> patch. This
>> patch changes pmap_init() on amd64, armv6, and i386 so that
>> it no longer
>> consults phys_avail[] to determine the end of memory.
>> Instead, it calls
>> a new function provided by vm_phys.c to obtain the same
>> information from
>> vm_phys_segs[].
>>
>> With this change, the new variable phys_managed in your patch
>> wouldn't
>> need to be a global. It could be a local variable in
>> vm_page_startup()
>> that we pass as a parameter to vm_phys_init() and
>> vm_reserv_init().
>>
>> More generally, the long-term vision that I have is that we
>> would stop
>> using phys_avail[] after vm_page_startup() had completed. It
>> would only
>> be used during initialization. After that we would use
>> vm_phys_segs[]
>> and functions provided by vm_phys.c.
>>
>>
>> I understand. The patch and the long-term vision are fine for me.
>> I just was not to bold to pass phys_managed as a parameter to
>> vm_phys_init() and vm_reserv_init(). However, I certainly was
>> thinking about it. While reading comment above vm_phys_get_end(),
>> do we care of if last usable address is 0xFFFFFFFF?
>
>
> To date, this hasn't been a problem. However, handling 0xFFFFFFFF
> is easy. So, the final version of the patch that I committed this
> weekend does so.
>
> Can you please try the attached patch? It replaces phys_avail[]
> with vm_phys_segs[] in arm's busdma.
>
>
>
> It works fine on arm-v6 pandaboard. I have no objection to commit it.
> However, it's only 1:1 replacement.
Right now, yes. However, once your patch is committed, it won't be 1:1
anymore, because vm_phys_segs[] will be populated based on dump_avail[]
rather than phys_avail[].
My interpretation of the affected code is that using the ranges defined
by dump_avail[] is actually closer to what this code intended.
> In fact, I still keep the following pattern in my head:
>
> present memory in system <=> all RAM and whatsoever
> nobounce memory <=> addressable by DMA
In general, I don't see how this can be an attribute of the memory,
because it's going to depend on the device. In other words, a given
physical address may require bouncing for some device but not all devices.
> managed memory by vm subsystem <=> i.e. kept in vm_page_array
> available memory for vm subsystem <=> can be allocated
>
> So, it's no problem to use phys_avail[], i.e. vm_phys_segs[], but it
> could be too much limiting in some scenarios. I would like to see
> something different in exclusion_bounce_check() in the future.
> Something what reflects NOBOUNCE property and not NOALLOC one like now.
>
>
>
>
>
>
>> Do you think that the rest of my patch considering changes due to
>> your patch is ok?
>>
>
>
> Basically, yes. I do, however, think that
>
> +#if defined(__arm__)
> + phys_managed = dump_avail;
> +#else
> + phys_managed = phys_avail;
> +#endif
>
> should also be conditioned on VM_PHYSSEG_SPARSE.
>
>
>
>
> So I've prepared new patch. phys_managed[] is passed to vm_phys_init()
> and vm_reserv_init() as a parameter and small optimalization is made
> in vm_page_startup(). I add VM_PHYSSEG_SPARSE condition to place you
> mentioned. Anyhow, I still think that this is only temporary hack. In
> general, phys_managed[] should always be distinguished from phys_avail[].
>
>
>
>>
>>
>> >
>> > BTW, while I was inspecting all archs, I think that maybe
>> it's time to do
>> > what was done for busdma not long ago. There are many
>> similar codes across
>> > archs which deal with physical memory and could be
>> generalized and put to
>> > kern/subr_physmem.c for utilization. All work with physical
>> memory could be
>> > simplify to two arrays of regions.
>> >
>> > phys_present[] ... describes all present physical memory
>> regions
>> > phys_exclude[] ... describes various exclusions from
>> phys_present[]
>> >
>> > Each excluded region will be labeled by flags to say what
>> kind of exclusion
>> > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE,
>> NOMEMRW could
>> > be combined. This idea is taken from sys/arm/arm/physmem.c.
>> >
>> > All other arrays like phys_managed[], phys_avail[],
>> dump_avail[] will be
>> > created from these phys_present[] and phys_exclude[].
>> > This way bootstrap codes in archs could be simplified and
>> unified. For
>> > example, dealing with either hw.physmem or page with PA
>> 0x00000000 could be
>> > transparent.
>> >
>> > I'm prepared to volunteer if the thing is ripe. However,
>> some tutor will be
>> > looked for.
>>
>>
>> I've never really looked at arm/arm/physmem.c before. Let me
>> do that
>> before I comment on this.
>>
>> No problem. This could be long-term aim. However, I hope the
>> VM_PHYSSEG_SPARSE problem could be dealt with in MI code in
>> present time. In every case, thanks for your help.
>>
>>
>
>
More information about the freebsd-arch
mailing list