vm_page_array and VM_PHYSSEG_SPARSE
Alan Cox
alc at rice.edu
Mon Oct 27 06:59:35 UTC 2014
On 10/24/2014 06:33, Svatopluk Kraus wrote:
>
> On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox <alc at rice.edu
> <mailto:alc at rice.edu>> wrote:
>
> On 10/08/2014 10:38, Svatopluk Kraus wrote:
> > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc at rice.edu
> <mailto:alc at rice.edu>> wrote:
> >
> >> On 09/27/2014 03:51, Svatopluk Kraus wrote:
> >>
> >>
> >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox at gmail.com
> <mailto:alan.l.cox at gmail.com>> wrote:
> >>
> >>>
> >>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus
> <onwahe at gmail.com <mailto:onwahe at gmail.com>>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I and Michal are finishing new ARM pmap-v6 code. There is one
> problem
> >>>> we've
> >>>> dealt with somehow, but now we would like to do it better.
> It's about
> >>>> physical pages which are allocated before vm subsystem is
> initialized.
> >>>> While later on these pages could be found in vm_page_array when
> >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
> >>>> VM_PHYSSEG_SPARSE
> >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
> >>>>
> >>>> It really would be nice to utilize vm_page_array for such
> preallocated
> >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is
> used. Things
> >>>> could be much easier then. In our case, it's about pages
> which are used
> >>>> for
> >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two
> sets of such
> >>>> pages. First ones are preallocated and second ones are
> allocated after vm
> >>>> subsystem was inited. We must deal with each set differently.
> So code is
> >>>> more complex and so is debugging.
> >>>>
> >>>> Thus we need some method how to say that some part of
> physical memory
> >>>> should be included in vm_page_array, but the pages from that
> region
> >>>> should
> >>>> not be put to free list during initialization. We think that such
> >>>> possibility could be utilized in general. There could be a
> need for some
> >>>> physical space which:
> >>>>
> >>>> (1) is needed only during boot and later on it can be freed
> and put to vm
> >>>> subsystem,
> >>>>
> >>>> (2) is needed for something else and vm_page_array code could
> be used
> >>>> without some kind of its duplication.
> >>>>
> >>>> There is already some code which deals with blacklisted pages in
> >>>> vm_page.c
> >>>> file. So the easiest way how to deal with presented situation
> is to add
> >>>> some callback to this part of code which will be able to
> either exclude
> >>>> whole phys_avail[i], phys_avail[i+1] region or single pages.
> As the
> >>>> biggest
> >>>> phys_avail region is used for vm subsystem allocations, there
> should be
> >>>> some more coding. (However, blacklisted pages are not dealt
> with on that
> >>>> part of region.)
> >>>>
> >>>> We would like to know if there is any objection:
> >>>>
> >>>> (1) to deal with presented problem,
> >>>> (2) to deal with the problem presented way.
> >>>> Some help is very appreciated. Thanks
> >>>>
> >>>>
> >>> As an experiment, try modifying vm_phys.c to use dump_avail
> instead of
> >>> phys_avail when sizing vm_page_array. On amd64, where the
> same problem
> >>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right now,
> this is
> >>> probably my preferred solution. The catch being that not all
> architectures
> >>> implement dump_avail, but my recollection is that arm does.
> >>>
> >> Frankly, I would prefer this too, but there is one big open
> question:
> >>
> >> What is dump_avail for?
> >>
> >>
> >>
> >> dump_avail[] is solving a similar problem in the minidump code,
> hence, the
> >> prefix "dump_" in its name. In other words, the minidump code
> couldn't use
> >> phys_avail[] either because it didn't describe the full range
> of physical
> >> addresses that might be included in a minidump, so dump_avail[]
> was created.
> >>
> >> There is already precedent for what I'm suggesting.
> dump_avail[] is
> >> already (ab)used outside of the minidump code on x86 to solve
> this same
> >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
> >>
> >>
> >> Using it for vm_page_array initialization and segmentation
> means that
> >> phys_avail must be a subset of it. And this must be stated and
> be visible
> >> enough. Maybe it should be even checked in code. I like the idea of
> >> thinking about dump_avail as something what desribes all memory
> in a
> >> system, but it's not how dump_avail is defined in archs now.
> >>
> >>
> >>
> >> When you say "it's not how dump_avail is defined in archs now",
> I'm not
> >> sure whether you're talking about the code or the comments. In
> terms of
> >> code, dump_avail[] is a superset of phys_avail[], and I'm not
> aware of any
> >> code that would have to change. In terms of comments, I did a
> grep looking
> >> for comments defining what dump_avail[] is, because I couldn't
> remember
> >> any. I found one ... on arm. So, I don't think it's a onerous
> task
> >> changing the definition of dump_avail[]. :-)
> >>
> >> Already, as things stand today with dump_avail[] being used
> outside of the
> >> minidump code, one could reasonably argue that it should be
> renamed to
> >> something like phys_exists[].
> >>
> >>
> >>
> >> I will experiment with it on monday then. However, it's not
> only about how
> >> memory segments are created in vm_phys.c, but it's about how
> vm_page_array
> >> size is computed in vm_page.c too.
> >>
> >>
> >>
> >> Yes, and there is also a place in vm_reserv.c that needs to
> change. I've
> >> attached the patch that I developed and tested a long time
> ago. It still
> >> applies cleanly and runs ok on amd64.
> >>
> >>
> >>
> >
> >
> > Well, I've created and tested minimalistic patch which - I hope - is
> > commitable. It runs ok on pandaboard (arm-v6) and solves
> presented problem.
> > I would really appreciate if this will be commited. Thanks.
>
>
> Sorry for the slow reply. I've just been swamped with work lately. I
> finally had some time to look at this in the last day or so.
>
> The first thing that I propose to do is commit the attached
> patch. This
> patch changes pmap_init() on amd64, armv6, and i386 so that it no
> longer
> consults phys_avail[] to determine the end of memory. Instead, it
> calls
> a new function provided by vm_phys.c to obtain the same
> information from
> vm_phys_segs[].
>
> With this change, the new variable phys_managed in your patch wouldn't
> need to be a global. It could be a local variable in
> vm_page_startup()
> that we pass as a parameter to vm_phys_init() and vm_reserv_init().
>
> More generally, the long-term vision that I have is that we would stop
> using phys_avail[] after vm_page_startup() had completed. It
> would only
> be used during initialization. After that we would use vm_phys_segs[]
> and functions provided by vm_phys.c.
>
>
> I understand. The patch and the long-term vision are fine for me. I
> just was not to bold to pass phys_managed as a parameter to
> vm_phys_init() and vm_reserv_init(). However, I certainly was thinking
> about it. While reading comment above vm_phys_get_end(), do we care of
> if last usable address is 0xFFFFFFFF?
To date, this hasn't been a problem. However, handling 0xFFFFFFFF is
easy. So, the final version of the patch that I committed this weekend
does so.
Can you please try the attached patch? It replaces phys_avail[] with
vm_phys_segs[] in arm's busdma.
> Do you think that the rest of my patch considering changes due to your
> patch is ok?
>
Basically, yes. I do, however, think that
+#if defined(__arm__)
+ phys_managed = dump_avail;
+#else
+ phys_managed = phys_avail;
+#endif
should also be conditioned on VM_PHYSSEG_SPARSE.
>
>
> >
> > BTW, while I was inspecting all archs, I think that maybe it's
> time to do
> > what was done for busdma not long ago. There are many similar
> codes across
> > archs which deal with physical memory and could be generalized
> and put to
> > kern/subr_physmem.c for utilization. All work with physical
> memory could be
> > simplify to two arrays of regions.
> >
> > phys_present[] ... describes all present physical memory regions
> > phys_exclude[] ... describes various exclusions from phys_present[]
> >
> > Each excluded region will be labeled by flags to say what kind
> of exclusion
> > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE,
> NOMEMRW could
> > be combined. This idea is taken from sys/arm/arm/physmem.c.
> >
> > All other arrays like phys_managed[], phys_avail[], dump_avail[]
> will be
> > created from these phys_present[] and phys_exclude[].
> > This way bootstrap codes in archs could be simplified and
> unified. For
> > example, dealing with either hw.physmem or page with PA
> 0x00000000 could be
> > transparent.
> >
> > I'm prepared to volunteer if the thing is ripe. However, some
> tutor will be
> > looked for.
>
>
> I've never really looked at arm/arm/physmem.c before. Let me do that
> before I comment on this.
>
> No problem. This could be long-term aim. However, I hope the
> VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present
> time. In every case, thanks for your help.
>
>
-------------- next part --------------
Index: arm/arm/busdma_machdep-v6.c
===================================================================
--- arm/arm/busdma_machdep-v6.c (revision 273699)
+++ arm/arm/busdma_machdep-v6.c (working copy)
@@ -54,8 +54,9 @@ __FBSDID("$FreeBSD$");
#include <sys/uio.h>
#include <vm/vm.h>
+#include <vm/vm_param.h>
#include <vm/vm_page.h>
-#include <vm/vm_map.h>
+#include <vm/vm_phys.h>
#include <vm/vm_extern.h>
#include <vm/vm_kern.h>
@@ -277,16 +278,18 @@ SYSINIT(busdma, SI_SUB_KMEM+1, SI_ORDER_FIRST, bus
* express, so we take a fast out.
*/
static int
-exclusion_bounce_check(vm_offset_t lowaddr, vm_offset_t highaddr)
+exclusion_bounce_check(vm_paddr_t lowaddr, vm_paddr_t highaddr)
{
+ struct vm_phys_seg *seg;
int i;
if (lowaddr >= BUS_SPACE_MAXADDR)
return (0);
- for (i = 0; phys_avail[i] && phys_avail[i + 1]; i += 2) {
- if ((lowaddr >= phys_avail[i] && lowaddr < phys_avail[i + 1]) ||
- (lowaddr < phys_avail[i] && highaddr >= phys_avail[i]))
+ for (i = 0; i < vm_phys_nsegs; i++) {
+ seg = &vm_phys_segs[i];
+ if ((lowaddr >= seg->start && lowaddr < seg->end) ||
+ (lowaddr < seg->start && highaddr >= seg->start))
return (1);
}
return (0);
Index: arm/arm/busdma_machdep.c
===================================================================
--- arm/arm/busdma_machdep.c (revision 273699)
+++ arm/arm/busdma_machdep.c (working copy)
@@ -70,10 +70,11 @@ __FBSDID("$FreeBSD$");
#include <vm/uma.h>
#include <vm/vm.h>
+#include <vm/vm_param.h>
#include <vm/vm_extern.h>
#include <vm/vm_kern.h>
#include <vm/vm_page.h>
-#include <vm/vm_map.h>
+#include <vm/vm_phys.h>
#include <machine/atomic.h>
#include <machine/bus.h>
@@ -326,17 +327,19 @@ run_filter(bus_dma_tag_t dmat, bus_addr_t paddr)
* express, so we take a fast out.
*/
static __inline int
-_bus_dma_can_bounce(vm_offset_t lowaddr, vm_offset_t highaddr)
+_bus_dma_can_bounce(vm_paddr_t lowaddr, vm_paddr_t highaddr)
{
+ struct vm_phys_seg *seg;
int i;
if (lowaddr >= BUS_SPACE_MAXADDR)
return (0);
- for (i = 0; phys_avail[i] && phys_avail[i + 1]; i += 2) {
- if ((lowaddr >= phys_avail[i] && lowaddr <= phys_avail[i + 1])
- || (lowaddr < phys_avail[i] &&
- highaddr > phys_avail[i]))
+ for (i = 0; i < vm_phys_nsegs; i++) {
+ seg = &vm_phys_segs[i];
+ if ((lowaddr >= seg->start && lowaddr <= seg->end)
+ || (lowaddr < seg->start &&
+ highaddr > seg->start))
return (1);
}
return (0);
More information about the freebsd-arch
mailing list