svn commit: r218966 - head/sys/vm

Bruce Evans brde at
Wed Feb 23 21:23:55 UTC 2011

On Wed, 23 Feb 2011, Bruce Cran wrote:

> Log:
>  Calculate and return the count in vmspace_swap_count as a vm_offset_t
>  instead of an int to avoid overflow.
>  While here, clean up some style(9) issues.

vm_offset_t should not be abused to hold a count.

>  PR:		kern/152200
>  Reviewed by:	kib
>  MFC after:	2 weeks
> Modified:
>  head/sys/vm/swap_pager.c
>  head/sys/vm/vm_map.h
> Modified: head/sys/vm/swap_pager.c
> ==============================================================================
> --- head/sys/vm/swap_pager.c	Wed Feb 23 09:22:33 2011	(r218965)
> +++ head/sys/vm/swap_pager.c	Wed Feb 23 10:28:37 2011	(r218966)
> @@ -2420,23 +2420,24 @@ SYSCTL_NODE(_vm, OID_AUTO, swap_info, CT
>  *	if the VM object has any swap use at all the associated map entries
>  *	count for at least 1 swap page.
>  */
> -int
> +vm_offset_t
> vmspace_swap_count(struct vmspace *vmspace)

This is a count in pages according to its comment.  Thus vm_offset_t
is neither necessary not sufficient for it.  I'm not sure if all the
pages are mapped into vm at once, in which case vm_offset_t could count
them PAGE_SIZE times over and even int can count them PAGE_SIZE/2 times
over on i386, or if there can be as many pages as swap can hold, in
which case, in theory, vm_offset_t is accidentally large enough on 64
bit arches and still too small on 32-bit arches.  But int is large
enough in practice on all arches, since 16-bit ints are not supported
and 32-bit ints can count 2**31 pages = 8 TB with a minimal PAGE_SIZE
of 2**12.  I don't know if there is any multiple or sparse mapping
involved, but it would take a lot of it to use 8 TB.

vm still uses plain u_int for most of its most critical (physical) page
counters, starting with cnt.v_page_count for the total number of pages
in the system.  Thus it doesn't already have a page counter type.

> {
> -	vm_map_t map = &vmspace->vm_map;
> +	vm_map_t map;
> 	vm_map_entry_t cur;
> -	int count = 0;
> +	vm_object_t object;
> +	vm_offset_t count, n;
> -	for (cur = map->; cur != &map->header; cur = cur->next) {
> -		vm_object_t object;
> +	map = &vmspace->vm_map;
> +	count = 0;
> +	for (cur = map->; cur != &map->header; cur = cur->next) {
> 		if ((cur->eflags & MAP_ENTRY_IS_SUB_MAP) == 0 &&
> 		    (object = cur->object.vm_object) != NULL) {
> 			VM_OBJECT_LOCK(object);
> 			if (object->type == OBJT_SWAP &&
> 			    object->un_pager.swp.swp_bcount != 0) {
> -				int n = (cur->end - cur->start) / PAGE_SIZE;
> -
> +				n = (cur->end - cur->start) / PAGE_SIZE;
> 				count += object->un_pager.swp.swp_bcount *
> 				    SWAP_META_PAGES * n / object->size + 1;

The bug seems to have been overflow in this calculation.  `start' and
`end' have type vm_offset_t and large style bugs (missing prefixes in
their names) so they are hard to grep for.  When n is 32 bits int and
PAGE_SIZE is 2**12, the assignment to n overflows at a difference of 8TB,
but this probably can't happen (see above).  swap_bcnt still has type
int; SWAP_META_PAGES is 1, 2, 4, 8 or 16; thus swp_bcount * SWAP_META_PAGES
may overflow at 2**31/16 = 128 M.  If this doesn't overflow, but has its
maximal value of about 128 M, then multiplying it by "int n" may overflow
when n is just 32.  Then, if nothing has overflowed, division by
object->size reduces to a relatively small count in pages.  object->size
seems to have type vm_pindex_t which is 64 bits even on i386 (since it
is associated with vm_ooffset_t and not vm_offset_t, and vm_ooffset_t
must be 64 bits to support file of sizes >= 2GB although vm_pindex_t only
needs to be more than 32 bits to support files of sizes >= 8 TB (with
PAGE_SIZE = 2**12).  object->size has even larger bugs than `start' and
`end', since it is more global.

Summary: all of (object->un_pager.swp.swp_bcount * SWAP_META_PAGES * n /
object->size) was done in possibly-overflowing arithmetic using the
inadequately large type int, except for the final division which is done
using excessively large type vm_pindex_t.  Changing one of the ints to
vm_offset_t reduces the overflow possibilities a little.  But the reduction
is very little on i386, where the change is just from a 32 bit int to a
32 bit unsigned int.

> Modified: head/sys/vm/vm_map.h
> ==============================================================================
> --- head/sys/vm/vm_map.h	Wed Feb 23 09:22:33 2011	(r218965)
> +++ head/sys/vm/vm_map.h	Wed Feb 23 10:28:37 2011	(r218966)
> @@ -380,6 +380,6 @@ int vm_map_unwire(vm_map_t map, vm_offse
>     int flags);
> int vm_map_wire(vm_map_t map, vm_offset_t start, vm_offset_t end,
>     int flags);
> -int vmspace_swap_count (struct vmspace *vmspace);
> +vm_offset_t vmspace_swap_count(struct vmspace *vmspace);
> #endif				/* _KERNEL */

Maybe the critical overflows are actually in callers doing similarly
buggy multiplications.  The only caller seems to be vm_pageout_oom().
It abuses vm_offset_t to hold the result of the call ("vm_offset_t size,
bigsize") and to accumulate counts returned by vmspace_resident_count(),
which has another bogus type (long).  vm has a type vm_size_t for holding
sizes, but AFAIR this is in bytes and is only meant to be used for sizes
of single objects, so it is unsuitable for use here.

The file containing vm_pageout_oom() (vm_pageout.c) has a couple of
other `size' variables, and uses the reasonable (but much larger than
necessary, until 8 TB is a small amount of memory) type vm_pindex_t
for them.  The file vm_map.c contains dozens of `size' variables and
almost as many choices of types for them :-(: (warning: the following
is from "grep size vm_map.c" reduced a bit; there is not always quite
enough context):

% static int vmspace_zinit(void *mem, int size, int flags);
% static void vmspace_zfini(void *mem, int size);
% static void vm_map_zfini(void *mem, int size);
% static void vm_map_zdtor(void *mem, int size, void *arg);
% static void vmspace_zdtor(void *mem, int size, void *arg);

Plain int may work, but is sloppy.

% 	vm->vm_tsize = 0;
% 	vm->vm_dsize = 0;
% 	vm->vm_ssize = 0;

Hmm, there may actually be standard type for sizes in pages.  The above
3 have type segsz_t and are documented as being sizes in pages, with XXX's
for the first 2.

% 		       (vm_size_t)(prev_entry->end - prev_entry->start),
% 		       (vm_size_t)(end - prev_entry->end), charge_prev_obj)) {

Correct types, but bogus casts which were only needed for K&R compilers
since we have prototypes in scope.

% 			map->size += (end - prev_entry->end);

Yet another `size' struct member with no prefix in its name.  I think this
is the one in struct vm_map.  It has size vm_size_t, which seems to be

% 	new_entry->avail_ssize = 0;

avail_size in struct vm_map_entry has the bogus type vm_offset_t, and
the usual bad style.

% 	map->size += new_entry->end - new_entry->start;
% vm_map_findspace(vm_map_t map, vm_offset_t start, vm_size_t length,
%     vm_offset_t start, vm_size_t length, vm_prot_t prot,
% 	    vm_size_t length, int find_space, vm_prot_t prot,

Apparently correct.

% 	vm_size_t prevsize, esize;
% 		prevsize = prev->end - prev->start;
% 			(prev->offset + prevsize == entry->offset)) &&
% 				vm_map_entry_resize_free(map, entry->prev);
% 		esize = entry->end - entry->start;
% 			(entry->offset + esize == next->offset)) &&
% 			vm_map_entry_resize_free(map, entry);

Apparently correct except for formatting.

%     vm_object_t object, vm_pindex_t pindex, vm_size_t size, int flags)
% 	vm_pindex_t psize, tmpidx;

Apparently correct.  This is part of vm_map_pmap_enter()'s declarations.
Anything that has to deal with pindex's has to be careful with types to
work at all.

% 	psize = atop(size);

Another issue is whether macros like atop() work with arbitrary integer
types for arguments.  I added some upcasts in some of them, but it is
not easy to find the right upcast, short of [u]intmax_t which may be
excessive.  jake@ preferred explicit cast in the macro invovations and
used this a bit for PAE and sparc64.  Here atop() MD and a right shift
so there is no problem.  But for the related ptoa(), on at least i386
it is just a left shift, so it won't work starting with a count in pages,
of type anything reasonable except vm_pindex_t, iff the count in bytes
exceeds 4GB (or 2GB starting with an uint).

% 	vm_size_t size;
% 		size = (end <= current->end ? end : current->end) - start;
% 			vm_size_t tsize;
% 			tsize = tentry->end - offset;

As usual, vm_size_t is correct for internal sizes.

% 	vm_pindex_t offidxstart, offidxend, count, size1;

The count and size1 variables are poorly named at best.

% 	vm_ooffset_t size;

Might be correct.  vm is generally careful to distinguish sizes from
offset using vm_offset_t instead of vm_size_t, but there is nothing
corresponding to vm_size_t for files, so off_t and vm_offset_t are
often abused for sizes.  This abuse becomes larger for sizes that
are accumulations.

% 				size1 = object->size;

object->size has type vm_pindex_t, so we see that size_1 has the correct

% 				object->size = offidxstart;
% 					size1 -= object->size;
% 					KASSERT(object->charge >= ptoa(size1),

and ptoa(size1) will work, size size1 has a large enough type but small
enough value to not overflow the left shift.

% 					swap_release_by_cred(ptoa(size1), object->cred);
% 					object->charge -= ptoa(size1);

Correct enough.  The sizes become vm_ooffset_t's after ptoa().  ptoa()
strictly doesn't change the type from vm_pindex_t, but both are int64_t.
But if C's type checking were complete enough to complain about int
vs vm_size_t mismatches, then it would also complain about this mismatch.
BTW, I recently noticed that gdb is stricter about types than gcc -- I
wanted to know what a type was, but "whatis" in gdb said that it was a
foo_t when I wanted to know its basic type.

% 	vm_offset_t size;
% 		size = src_entry->end - src_entry->start;

Seems incorrect.  `size' is actually a size, not an offset, but its
type is spelled with "offset".

% 				src_object->charge = size;
% 				*fork_charge += size;
% 					*fork_charge += size;
% 				*fork_charge += size;

This accumulates multiple sizes of the bogus type vm_offset_t into the
logically larger type vm_ooffset_t.

% [...]

Got bored here.


More information about the svn-src-all mailing list