40% slowdown with dynamic /bin/sh
Matthew Dillon
dillon at apollo.backplane.com
Wed Nov 26 11:50:41 PST 2003
:At 00:23 26/11/2003 -0500, Michael Edenfield wrote:
:>Static /bin/sh:
:> real 385m29.977s
:> user 111m58.508s
:> sys 93m14.450s
:>
:>Dynamic /bin/sh:
:> real 455m44.852s
:> user 113m17.807s
:> sys 103m16.509s
:
: Given that user+sys << real in both cases, it looks like you're running
:out of memory; it's not surprising that dynamic linking has an increased
:cost in such circumstances, since reading the diverse files into memory
:will take longer than reading a single static binary.
: I doubt many systems will experience this sort of performance delta.
:
:Colin Percival
It definitely looks memory related but the system isn't necessarily
'running out' of memory. It could simply be that the less memory
available for caching files is causing more disk I/O to occur. It
should be possible to quanity this by doing a full timing of the
build ( /usr/bin/time -l ), which theoretically includes I/O ops.
Dynamically linked code definitely dirties more anonymous memory then
static, and definitely accesses more shared file pages. The difference
is going to depend on the complexity of the program. How much this
effects system peformance depends on the situation. If the system has
significant idle cycles available the impact should not be too serious,
but if it doesn't then the overhead will drag down the pre-zerod pages
(even if the program is exec'd, does something real quick, and exits).
I have included a program below that prints the delta free page count
and the delta zero-fill count once a second. This can be used to
estimate anonymous memory use. Run the program and let it stabilize.
Be sure that the system is idle. Then run the target program (it needs
to stick around, it can't just exec and exit), then exit the target
program and repeat. Leave several seconds in between invocation, exit,
and repeat to allow the system to stabilize. Note that it may take
several runs to get reliable information since the program is measuring
anonymous memory use for the whole system. Also note that shared pages
will not be measured by this program, only the number of dirtied
anonymous pages. If on an idle system the program is not reporting
'0 0' then your system isn't idle :-).
The main indicator is the 'freepg' negative jump when the target program
is invoked. The zfod count will be a subset of that, indicating the
number of zero-fill pages requested (verses program text/data COW pages
which do not need zero'd pages but still eat anonymous memory for the
duration of the target program).
When I tested it with a static and dynamic /bin/sh on 4.8 I got
(looking at 'freepg'), 20 pages for the static binary and 50 pages for
the dynamic binary. So a dynamic /bin/sh eats 30 * 4K = 120K more
anonymous memory then a static /bin/sh. In the same test I got
12 ZFOD faults for the static binary and 34 ZFOD faults for the
dynamic binary, which means that 22 additional pre-zero'd pages are
being allocated in the dynamic case (88KB).
If /bin/sh is exec'd a lot in a situation where the system is otherwise
not idle, this will impact the number of pre-zero'd pages available on
the system. Each exec of a dyanmic /bin/sh eats 22 additional pages
(88K) worth of zero-fill. Each resident copy of (exec'd) /bin/sh eats
120KB more dirty anonymous memory. make buildworld -j 1 may have as
many as a dozen /bin/sh's exec'd at any given moment (impact 120K each)
depending on where in the build it is. -j 2 and so forth will have
even more. This will impact your system relative to the amount of total
system memory you have. The more system memory you have, the less the
percentage impact.
/bin/sh /bin/csh
-------------- -----------------------
static freepg -19 zfod 12 freepg -140 zfod 129
dynamic freepg -50 zfod 34 freepg -167 zfod 149
/usr/bin/make (note that make is static by default)
--------------
static freepg -33 zfod 27
dynamic freepg -51 zfod 44
As you can see, the issue becomes less significant on a percentage
basis with larger programs that already allocate more incidental memory.
Also to my surprise I found that 'make' was already static. It would
seem that this issue was recognized long ago. bzip2, chflags, make,
and objformat are compiled statically even though they reside in /usr/bin.
-Matt
/*
* print delta free pages and zfod requests once a second. Leave running
* while testing other programs. Note: ozfod is not displayed. ozfod is
* a subset of zfod, just as zfod deltas are a subset of v_free_count
* allocations.
*/
#include <sys/types.h>
#include <sys/sysctl.h>
#include <stdio.h>
#include <unistd.h>
int
main(int ac, char **av)
{
int fc1;
int zfod1;
int fc2;
int zfod2;
size_t fclen;
fclen = sizeof(fc1);
sysctlbyname("vm.stats.vm.v_free_count", &fc1, &fclen, NULL, 0);
fclen = sizeof(zfod1);
sysctlbyname("vm.stats.vm.v_zfod", &zfod1, &fclen, NULL, 0);
for (;;) {
fclen = sizeof(fc1);
sysctlbyname("vm.stats.vm.v_free_count", &fc2, &fclen, NULL, 0);
fclen = sizeof(zfod2);
sysctlbyname("vm.stats.vm.v_zfod", &zfod2, &fclen, NULL, 0);
printf("freepg %-4d zfod %-4d\n",
fc2 - fc1,
zfod2 - zfod1);
sleep(1);
fc1 = fc2;
zfod1 = zfod2;
}
return(0);
}
More information about the freebsd-current
mailing list