kern/112194: [PATCH] VM handles systems with sparse physical memory poorly

Fri Apr 27 22:20:07 UTC 2007

>Number:         112194
>Category:       kern
>Synopsis:       [PATCH] VM handles systems with sparse physical memory poorly
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Apr 27 22:20:04 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Nathan Whitehorn
>Release:        7-CURRENT
>Organization:
University of Chicago
>Environment:
>Description:
On systems with sparse physical memory (UltraSPARC IIIi systems map each DIMM to 8 GB boundaries, and each CPU's RAM to 64 GB boundaries, for instance), the page table allocates an enormous number of page table entries for the RAM that isn't there. On UltraSPARC IIIi in particular, storing this array exceeds the maximum available contiguous physical memory, which causes early kernel panics. I believe this is also a problem on IA64.

The attached patch only allocates page table entries for physical RAM that exists -- the price is slowing down PHYS_TO_VM_PAGE proportional to the number of entries in phys_avail. Because phys_avail should only be large on systems with very discontiguous physical memory (where this approach most matters), and PHYS_TO_VM_PAGE is called rarely, I think this is a price worth paying.
>How-To-Repeat:

>Fix:


Patch attached with submission follows:

--- vm_page.c.dist	Sun Apr 15 11:12:39 2007
+++ vm_page.c	Sun Apr 15 11:11:24 2007
@@ -212,7 +212,7 @@
 	/* the biggest memory array is the second group of pages */
 	vm_paddr_t end;
 	vm_paddr_t biggestsize;
-	vm_paddr_t low_water, high_water;
+	vm_paddr_t low_water;
 	int biggestone;
 
 	vm_paddr_t total;
@@ -221,6 +221,7 @@
 	biggestsize = 0;
 	biggestone = 0;
 	nblocks = 0;
+	page_range = 0;
 	vaddr = round_page(vaddr);
 
 	for (i = 0; phys_avail[i + 1]; i += 2) {
@@ -229,20 +230,19 @@
 	}
 
 	low_water = phys_avail[0];
-	high_water = phys_avail[1];
 
 	for (i = 0; phys_avail[i + 1]; i += 2) {
 		vm_paddr_t size = phys_avail[i + 1] - phys_avail[i];
 
-		if (size > biggestsize) {
+		if ((size > biggestsize) && (size > (boot_pages * UMA_SLAB_SIZE))) {
 			biggestone = i;
 			biggestsize = size;
 		}
+
 		if (phys_avail[i] < low_water)
 			low_water = phys_avail[i];
-		if (phys_avail[i + 1] > high_water)
-			high_water = phys_avail[i + 1];
 		++nblocks;
+		page_range += size/PAGE_SIZE;
 		total += size;
 	}
 
@@ -297,8 +297,9 @@
 	 * use (taking into account the overhead of a page structure per
 	 * page).
 	 */
+	
 	first_page = low_water / PAGE_SIZE;
-	page_range = high_water / PAGE_SIZE - first_page;
+	total = page_range * PAGE_SIZE;
 	npages = (total - (page_range * sizeof(struct vm_page)) -
 	    (end - new_end)) / PAGE_SIZE;
 	end = new_end;
--- vm_page.h.dist	Sun Apr 15 11:12:29 2007
+++ vm_page.h	Sun Apr 15 11:12:17 2007
@@ -272,11 +272,21 @@
 extern vm_page_t vm_page_array;		/* First resident page in table */
 extern int vm_page_array_size;		/* number of vm_page_t's */
 extern long first_page;			/* first physical page number */
+static __inline vm_page_t vm_phys_to_vm_page(vm_paddr_t pa);
+
+static __inline vm_page_t vm_phys_to_vm_page(vm_paddr_t pa) {
+	int i,j;
+	
+	for (i = 0, j = 0; (phys_avail[i+1] <= pa) || (phys_avail[i] > pa); i+= 2)
+		j += atop(phys_avail[i+1] - phys_avail[i]);
+
+	return &vm_page_array[j + atop(pa - phys_avail[i])];
+}
 
 #define VM_PAGE_TO_PHYS(entry)	((entry)->phys_addr)
 
 #define PHYS_TO_VM_PAGE(pa) \
-		(&vm_page_array[atop(pa) - first_page ])
+		vm_phys_to_vm_page(pa)
 
 extern struct mtx vm_page_queue_mtx;
 #define vm_page_lock_queues()   mtx_lock(&vm_page_queue_mtx)

>Release-Note:
>Audit-Trail:
>Unformatted: