[PATCH] Add ktrace records for user page faults
John Baldwin
jhb at freebsd.org
Mon May 2 21:09:29 UTC 2011
On Monday, May 02, 2011 4:16:02 pm Kostik Belousov wrote:
> On Mon, May 02, 2011 at 04:02:02PM -0400, John Baldwin wrote:
> > On Monday, May 02, 2011 3:55:55 pm Kostik Belousov wrote:
> > > On Mon, May 02, 2011 at 03:37:19PM -0400, John Baldwin wrote:
> > > > One thing I have found useful is knowing when processes are in the kernel
> > > > instead of in userland. ktrace already provides records for syscall
> > > > entry/exit. The other major source of time spent in the kernel that I've seen
> > > > is page fault handling. To that end, I have a patch that adds ktrace records
> > > > to the beginning and end of VM faults. This gives a pair of records so a user
> > > > can see how long a fault took (similar to how one can see how long a syscall
> > > > takes now). Sample output from kdump is below:
> > > >
> > > > 47565 echo CALL mmap(0x800a87000,0x179000,PROT_READ|
> > > > PROT_WRITE,MAP_PRIVATE|MAP_ANON,0xffffffff,0)
> > > > 47565 echo RET mmap 34370777088/0x800a87000
> > > > 47565 echo PFLT 0x800723000 VM_PROT_EXECUTE
> > > > 47565 echo RET KERN_SUCCESS
> > > > 47565 echo CALL munmap(0x800887000,0x179000)
> > > > 47565 echo RET munmap 0
> > > > 47565 echo PFLT 0x800a00000 VM_PROT_WRITE
> > > > 47565 echo RET KERN_SUCCESS
> > > >
> > > > The patch is available at www.freebsd.org/~jhb/patches/ktrace_fault.patch and
> > > > included below.
> > >
> > > One immediate detail is that trap() truncates the fault address to the
> > > page address, that arguably looses useful information.
> >
> > It is true that it would be nice to have the exact faulting address, though
> > having page granularity has been sufficient for the few times I've actually
> > used the address itself (e.g. I could figure out which page of libstdc++ a
> > fault occurred on and narrow down from there as to which of the routines most
> > likely was executed given what the app was doing at the time). In my case
> > knowing how much time was spent handling a page fault has been useful.
> >
> > Would we have to push this logic out of vm_fault and into every trap() routine
> > to get the original address? That would make the patch quite a bit bigger
> > (touching N MD files vs 1 MI file).
>
> Or do the reverse, making vm_fault() do trunc_page() [if doing this
> change at all].
Ok. That sounds sensible.
> Also, I want to note another small detail, that is relevant if you plan
> to MFC the change. In HEAD, vm_fault() is only called from trap()s, and
> in-kernel page reads where substituted by vm_fault_hold() calls. This is
> not true for stable.
>
> Hm, was it indended to report faults from uiomove etc ?
> I had this change (that relates to OOM handling) for long time, and
> realized that it might be useful there.
I don't mind if it reports those faults as they will be bracketed by
a system call entry/exit. However, I primarily care about user-initiated
faults.
> diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c
> index 4e5f8b8..0d1e68f 100644
> --- a/sys/amd64/amd64/trap.c
> +++ b/sys/amd64/amd64/trap.c
> @@ -697,7 +697,8 @@ trap_pfault(frame, usermode)
> PROC_UNLOCK(p);
>
> /* Fault in the user page: */
> - rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL);
> + rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL |
> + (usermode ? VM_FAULT_USERMODE : 0));
>
> PROC_LOCK(p);
> --p->p_lock;
> diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c
> index 5a8016c..236f295 100644
> --- a/sys/i386/i386/trap.c
> +++ b/sys/i386/i386/trap.c
> @@ -856,7 +856,8 @@ trap_pfault(frame, usermode, eva)
> PROC_UNLOCK(p);
>
> /* Fault in the user page: */
> - rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL);
> + rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL |
> + (usermode ? VM_FAULT_USERMODE : 0));
>
> PROC_LOCK(p);
> --p->p_lock;
> diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
> index d417a84..c1e87ae 100644
> --- a/sys/vm/vm_fault.c
> +++ b/sys/vm/vm_fault.c
> @@ -408,6 +408,11 @@ RetryFault:;
> */
> fs.m = NULL;
> if (!vm_page_count_severe() || P_KILLED(curproc)) {
> + if (P_KILLED(curproc) && (fault_flags &
> + VM_FAULT_USERMODE) != 0) {
> + unlock_and_deallocate(&fs);
> + return (KERN_RESOURCE_SHORTAGE);
> + }
> #if VM_NRESERVLEVEL > 0
> if ((fs.object->flags & OBJ_COLORED) == 0) {
> fs.object->flags |= OBJ_COLORED;
> diff --git a/sys/vm/vm_map.h b/sys/vm/vm_map.h
> index 5311e02..7444549 100644
> --- a/sys/vm/vm_map.h
> +++ b/sys/vm/vm_map.h
> @@ -326,6 +326,7 @@ long vmspace_wired_count(struct vmspace *vmspace);
> #define VM_FAULT_NORMAL 0 /* Nothing special */
> #define VM_FAULT_CHANGE_WIRING 1 /* Change the wiring as appropriate */
> #define VM_FAULT_DIRTY 2 /* Dirty the page; use w/VM_PROT_COPY */
> +#define VM_FAULT_USERMODE 4 /* Fault initiated by usermode */
>
> /*
> * The following "find_space" options are supported by vm_map_find()
>
--
John Baldwin
More information about the freebsd-arch
mailing list