[PATCH] Add ktrace records for user page faults
Kostik Belousov
kostikbel at gmail.com
Mon May 2 20:16:06 UTC 2011
On Mon, May 02, 2011 at 04:02:02PM -0400, John Baldwin wrote:
> On Monday, May 02, 2011 3:55:55 pm Kostik Belousov wrote:
> > On Mon, May 02, 2011 at 03:37:19PM -0400, John Baldwin wrote:
> > > One thing I have found useful is knowing when processes are in the kernel
> > > instead of in userland. ktrace already provides records for syscall
> > > entry/exit. The other major source of time spent in the kernel that I've seen
> > > is page fault handling. To that end, I have a patch that adds ktrace records
> > > to the beginning and end of VM faults. This gives a pair of records so a user
> > > can see how long a fault took (similar to how one can see how long a syscall
> > > takes now). Sample output from kdump is below:
> > >
> > > 47565 echo CALL mmap(0x800a87000,0x179000,PROT_READ|
> > > PROT_WRITE,MAP_PRIVATE|MAP_ANON,0xffffffff,0)
> > > 47565 echo RET mmap 34370777088/0x800a87000
> > > 47565 echo PFLT 0x800723000 VM_PROT_EXECUTE
> > > 47565 echo RET KERN_SUCCESS
> > > 47565 echo CALL munmap(0x800887000,0x179000)
> > > 47565 echo RET munmap 0
> > > 47565 echo PFLT 0x800a00000 VM_PROT_WRITE
> > > 47565 echo RET KERN_SUCCESS
> > >
> > > The patch is available at www.freebsd.org/~jhb/patches/ktrace_fault.patch and
> > > included below.
> >
> > One immediate detail is that trap() truncates the fault address to the
> > page address, that arguably looses useful information.
>
> It is true that it would be nice to have the exact faulting address, though
> having page granularity has been sufficient for the few times I've actually
> used the address itself (e.g. I could figure out which page of libstdc++ a
> fault occurred on and narrow down from there as to which of the routines most
> likely was executed given what the app was doing at the time). In my case
> knowing how much time was spent handling a page fault has been useful.
>
> Would we have to push this logic out of vm_fault and into every trap() routine
> to get the original address? That would make the patch quite a bit bigger
> (touching N MD files vs 1 MI file).
Or do the reverse, making vm_fault() do trunc_page() [if doing this
change at all].
Also, I want to note another small detail, that is relevant if you plan
to MFC the change. In HEAD, vm_fault() is only called from trap()s, and
in-kernel page reads where substituted by vm_fault_hold() calls. This is
not true for stable.
Hm, was it indended to report faults from uiomove etc ?
I had this change (that relates to OOM handling) for long time, and
realized that it might be useful there.
diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c
index 4e5f8b8..0d1e68f 100644
--- a/sys/amd64/amd64/trap.c
+++ b/sys/amd64/amd64/trap.c
@@ -697,7 +697,8 @@ trap_pfault(frame, usermode)
PROC_UNLOCK(p);
/* Fault in the user page: */
- rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL);
+ rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL |
+ (usermode ? VM_FAULT_USERMODE : 0));
PROC_LOCK(p);
--p->p_lock;
diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c
index 5a8016c..236f295 100644
--- a/sys/i386/i386/trap.c
+++ b/sys/i386/i386/trap.c
@@ -856,7 +856,8 @@ trap_pfault(frame, usermode, eva)
PROC_UNLOCK(p);
/* Fault in the user page: */
- rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL);
+ rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL |
+ (usermode ? VM_FAULT_USERMODE : 0));
PROC_LOCK(p);
--p->p_lock;
diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index d417a84..c1e87ae 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -408,6 +408,11 @@ RetryFault:;
*/
fs.m = NULL;
if (!vm_page_count_severe() || P_KILLED(curproc)) {
+ if (P_KILLED(curproc) && (fault_flags &
+ VM_FAULT_USERMODE) != 0) {
+ unlock_and_deallocate(&fs);
+ return (KERN_RESOURCE_SHORTAGE);
+ }
#if VM_NRESERVLEVEL > 0
if ((fs.object->flags & OBJ_COLORED) == 0) {
fs.object->flags |= OBJ_COLORED;
diff --git a/sys/vm/vm_map.h b/sys/vm/vm_map.h
index 5311e02..7444549 100644
--- a/sys/vm/vm_map.h
+++ b/sys/vm/vm_map.h
@@ -326,6 +326,7 @@ long vmspace_wired_count(struct vmspace *vmspace);
#define VM_FAULT_NORMAL 0 /* Nothing special */
#define VM_FAULT_CHANGE_WIRING 1 /* Change the wiring as appropriate */
#define VM_FAULT_DIRTY 2 /* Dirty the page; use w/VM_PROT_COPY */
+#define VM_FAULT_USERMODE 4 /* Fault initiated by usermode */
/*
* The following "find_space" options are supported by vm_map_find()
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20110502/fa8bf18f/attachment.pgp
More information about the freebsd-arch
mailing list