fork speed vs /bin/sh
Matthew Dillon
dillon at apollo.backplane.com
Thu Nov 27 11:48:51 PST 2003
:What this shows is that vfork() is 3 times faster than fork() on static
:binaries, and 9 times faster on dynamic binaries. If people are
:worried about a 40% slowdown, then perhaps they'd like to investigate
:a speedup that works no matter whether its static or dynamic? There is
:a reason that popen(3) uses vfork(). /bin/sh should too, regardless of
:whether its dynamic or static. csh/tcsh already uses vfork() for the
:same reason.
:
:NetBSD have already taken advantage of this speedup and their /bin/sh uses
:vfork(). Some enterprising individual who cares about /bin/sh speed should
:check out that. Start looking near #ifdef DO_SHAREDVFORK.
That isn't really a fair comparison because your vfork is hitting a
degenerate case and isn't actually doing anything significant. You
really need to exec() something. I've included a program below
that [v]fork/exec's "./sh -c exit 0" 5000 times.
Dell2550, 2xCPU (MP build), DFly
0.000u 4.095s 0:02.53 161.6% 154+107k 0+0io 0pf+0w VFORK/EXEC STATIC SH
0.000u 6.681s 0:04.04 165.3% 94+97k 0+0io 0pf+0w FORK/EXEC STATIC SH
0.500u 16.844s 0:16.34 106.1% 53+84k 0+0io 0pf+0w VFORK/EXEC DYNAMIC SH
0.093u 18.303s 0:23.86 77.0% 42+79k 0+0io 0pf+0w FORK/EXEC DYNAMIC SH
Athlon64, 2xCPU (UP), DFly
0.078u 0.687s 0:00.74 101.3% 399+226k 0+0io 0pf+0w VFORK/EXEC STATIC SH
0.117u 0.968s 0:01.07 100.0% 273+208k 0+0io 0pf+0w FORK/EXEC STATIC SH
2.218u 2.484s 0:04.71 99.5% 121+180k 0+0io 1pf+0w VFORK/EXEC DYNAMIC SH
2.281u 2.773s 0:04.98 101.4% 113+179k 0+0io 0pf+0w FORK/EXEC DYNAMIC SH
1.304u 2.289s 0:03.60 99.4% 121+180k 0+0io 0pf+0w VFORK/EXEC DYNAMIC SH
WITH PREBINDING.
1.296u 2.648s 0:03.90 100.7% 112+180k 0+0io 1pf+0w FORK/EXEC DYNAMIC SH
WITH PREBINDING.
These results were rather unexpected, actually. I'm not sure why the
numbers on the DELL box are so bad with a dynamic 'sh' but I suspect that
the dynamic linking is blowing out the L1 cache.
In anycase, taking the Athlon64 system the difference between static and
dynamic is around 4 seconds while the difference between vfork and fork
is only around 0.25 seconds, so while moving to vfork() helps it doesn't
help all that much.
Unless you happen to be hitting a boundary condition on the L1 cache,
that is. If that is presumably the case on the Dell box (which only
has a 16K L1 cache where as the AMD64 has a 64K L1 cache), then the
difference is around 14 seconds between vfork static and vfork dynamic
verses an additional 8 seconds going from vfork to fork. Vfork would
probably be a significant improvement on the DELL box.
Prebinding generates around a 20% overhead improvement for the dynamic 'sh'
on the Athlon64 but on the Dell2550 prebinding actually made things
go slower (not shown above), from 23.8 seconds to 26 seconds. I
think there is an edge case due to prebinding having a greater L1 cache
impact. For larger, more complex programs prebinding shows definite,
if small, improvements.
-Matt
/*
* CD into the directory containing the ./sh executable before running
*/
#include <sys/types.h>
#include <stdio.h>
#include <unistd.h>
main()
{
int i;
pid_t pid;
for (i = 0; i < 5000; ++i) {
if ((pid = vfork()) == 0) { /* <<<<< CHANGE THIS FORK/VFORK */
execl("./sh", "./sh", "-c", "exit", "0", NULL);
write(2, "problem\n", 8);
_exit(1);
}
if (pid > 0)
waitpid(pid, NULL, 0);
}
return(0);
}
More information about the freebsd-current
mailing list