[PATCH] Rework how we store process times in the kernel and
deferring calcru()
John Baldwin
jhb at FreeBSD.org
Fri Oct 1 08:02:04 PDT 2004
I'll commit this soonish unless there are any objections. The basic idea is
to store process times resource usage as raw data (i.e. as bintimes and tick
counts) for both process usage and child usage and only calculate the timeval
style times if they are explicitly asked for. This lets us avoid always
calling calcru() to calculate the timeval values in exit1() for example. A
more detailed listing of the changes follows:
- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. This also includes an additional fix so that calcru()
now correctly handles threads from the process that are executing on other
CPUs. Also, the calcru() now only locks sched_lock internally while doing
the rux_runtime fixup. It now only requires the caller to hold the proc
lock and calcru1() only requires the proc lock internally. calcru() also no
longer allows callers to ask for an interrupt timeval since none of them
actually did.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
As a side effect of storing the raw values, the accuracy of the process timing
has been approved. This makes benchmarking somewhat tricky as the appearance
is that with this patch user times go way up but system times go way down.
Thus, the only benchmarks I did were to compare real times and to also
compare the sum of the user and system times to the real times. Thus, here
are the results on a kernel w/o debugging (when WITNESS + INVARIANTS were on,
the extra overhead resulted in no statistical difference in the before and
after). For real times (100 runs of 10000 fork/wait loops):
x smpng.fast.real
+ proc.fast.real
+--------------------------------------------------------------------------+
| + |
| + |
| + + |
| + + |
| + + |
| + + |
| + + |
| + + x x |
| + + x x |
| + + x x |
| + + x x |
| + + x x x |
| + + x x x |
| + + x x x |
| + + x x x |
| + + x x x x |
| + + + x x x x |
| + + + x x x x |
| + + + x x x x |
| + + + x x x x |
| + + + x x x x x |
| + + + x x x x x |
| + + + + x x x x x |
| + + + + x x x x x |
| + + + + x x x x x x |
| + + + + + * x x x x x |
| + + + + + + * x x x x x |
| + + + + + + * x x x x x |
| + + + + + + + * * x x x x x |
|+ + + + + + + + * * * x x x x x x|
| |___M__A_____| |____M_A______| |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 100 2.97 3.08 2.99 2.9959 0.018968075
+ 100 2.88 2.99 2.93 2.9362 0.017568337
Difference at 95.0% confidence
-0.0597 +/- 0.0050674
-1.99272% +/- 0.169145%
(Student's t, pooled s = 0.0182816)
So, close to about a 2% improvement. As far as accuracy "improvements", the
numbers comparing sum of user + sys compared to "real" time is:
x smpng.fast.real
+ smpng.fast.total
N Min Max Median Avg Stddev
x 100 2.97 3.08 2.99 2.9959 0.018968075
+ 100 2.83 2.93 2.86 2.8601 0.016111668
Difference at 95.0% confidence
-0.1358 +/- 0.0048779
-4.53286% +/- 0.162819%
(Student's t, pooled s = 0.0175979)
And for the kernel with the patch:
x proc.fast.real
+ proc.fast.total
N Min Max Median Avg Stddev
x 100 2.88 2.99 2.93 2.9362 0.017568337
+ 100 2.85 2.96 2.92 2.9201 0.017551943
Difference at 95.0% confidence
-0.0161 +/- 0.00486742
-0.548328% +/- 0.165773%
(Student's t, pooled s = 0.0175601)
Thus, the total counts are closer to the real times with the patch than
without the patch. Given that these results were repeated numerous times
with different benchmarks on an idle box in the same state I feel that these
differences indicate an improvement in the accuracy of the accounting.
The patch is at http://www.FreeBSD.org/~jhb/patches/rusage_ext.patch and is
largely based on a patch originally submitted by bde at .
--
John Baldwin <jhb at FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve" = http://www.FreeBSD.org
More information about the freebsd-arch
mailing list