Urk, I take it back (was Re: Bug in p_estcpu handling onprocess
exit in FBsd-4.x)
Matthew Dillon
dillon at apollo.backplane.com
Sat Mar 20 16:31:40 PST 2004
All right, I figured out a solution. Basically the solution for the
4.x scheduler (and the 4BSD scheduler in 5.x for people still using
it) is to bump the child's estcpu in fork and recover any delta changes
back to the parent in exit. The DFly patch set is rather DFly specific,
so I will just explain it in case someone in FreeBSD land wants to fix
the problem in FreeBSD-4.
In sys/proc.h, in the proc structure:
u_int p_estcpu; /* Time averaged value of p_cpticks. */
ADDME u_int p_estcpu_fork;
In kern/kern_fork.c, in fork1(), search for 'p_estcpu'. You will
find the line:
REMOVEME p2->p_estcpu = p1->p_estcpu;
Replace it with:
ADDME p2->p_estcpu_fork = p2->p_estcpu =
ADDME ESTCPULIM(p1->p_estcpu + ESTCPURAMP);
This will initialize a new fork()'d child with an estcpu that gives it
a slightly more 'batch' priority then its parent. If the fork()'d
child is an interactive process, the normal scheduling mechanisms will
float estcpu back down. This prevents new batch children from jerking
around interactive processes in the first few ticks of their operation,
and should have no significant effect on interactive children because
other interactive processes will not be eating all the cpu (so there is
cpu available), and any pre-existing batch processes will already
likely have far higher p_estcpu values.
On the exit side, instead of trying to average the child's estcpu into
the parent or trying to slap it in (to deal with batch scripts, e.g.
like make, which do a lot of recursive fork/exec's), just aggregate
the difference relative to the saved p_estcpu_fork into the parent,
though only if the child was found to be batch above and beyond the
p_estcpu[_fork] that was originally assigned to it. Otherwise the
parent's estcpu is allowed to stand on its own. The old FreeBSD code
would do terrible things to forking servers like sendmail(). The
new code should work much better.
In kern/kern_exit.c, in wait1(), replace this:
REMOVEME /* charge childs scheduling cpu usage to parent */
REMOVEME if (curproc->p_pid != 1) {
REMOVEME curproc->p_estcpu =
REMOVEME ESTCPULIM(curproc->p_estcpu + p->p_estcpu);
}
With this (note that 'q' is the same as 'curproc', so there is no
reason to reference 'curproc' when we can just use 'q'):
ADDME /*
ADDME * Charge the parent for the child's change in
ADDME * estimated cpu as of when the child exits to
ADDME * account for batch scripts, large make's, etc.
ADDME */
ADDME if (q->p_pid != 1) {
ADDME if (p->p_estcpu > p->p_estcpu_fork) {
ADDME q->p_estcpu = ESTCPULIM(q->p_estcpu +
ADDME p->p_estcpu - p->p_estcpu_fork);
ADDME }
ADDME }
That should do it. It seems to do very good job in DragonFly. If
anyone wants to do the work in FreeBSD I of course recommend that you
test it, YMMV.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-stable
mailing list