40% slowdown with dynamic /bin/sh

Matthew Dillon dillon at apollo.backplane.com
Wed Nov 26 11:50:41 PST 2003

:At 00:23 26/11/2003 -0500, Michael Edenfield wrote:
:>Static /bin/sh:
:>   real    385m29.977s
:>   user    111m58.508s
:>   sys     93m14.450s
:>Dynamic /bin/sh:
:>   real    455m44.852s
:>   user    113m17.807s
:>   sys     103m16.509s
:   Given that user+sys << real in both cases, it looks like you're running 
:out of memory; it's not surprising that dynamic linking has an increased 
:cost in such circumstances, since reading the diverse files into memory 
:will take longer than reading a single static binary.
:   I doubt many systems will experience this sort of performance delta.
:Colin Percival

    It definitely looks memory related but the system isn't necessarily
    'running out' of memory.  It could simply be that the less memory
    available for caching files is causing more disk I/O to occur.  It
    should be possible to quanity this by doing a full timing of the
    build ( /usr/bin/time -l ), which theoretically includes I/O ops.

    Dynamically linked code definitely dirties more anonymous memory then
    static, and definitely accesses more shared file pages.  The difference
    is going to depend on the complexity of the program.  How much this
    effects system peformance depends on the situation.  If the system has
    significant idle cycles available the impact should not be too serious,
    but if it doesn't then the overhead will drag down the pre-zerod pages
    (even if the program is exec'd, does something real quick, and exits).

    I have included a program below that prints the delta free page count
    and the delta zero-fill count once a second.  This can be used to
    estimate anonymous memory use.  Run the program and let it stabilize.
    Be sure that the system is idle. Then run the target program (it needs
    to stick around, it can't just exec and exit), then exit the target
    program and repeat.  Leave several seconds in between invocation, exit,
    and repeat to allow the system to stabilize.  Note that it may take 
    several runs to get reliable information since the program is measuring
    anonymous memory use for the whole system.  Also note that shared pages
    will not be measured by this program, only the number of dirtied
    anonymous pages.  If on an idle system the program is not reporting
    '0 0' then your system isn't idle :-).

    The main indicator is the 'freepg' negative jump when the target program
    is invoked.  The zfod count will be a subset of that, indicating the
    number of zero-fill pages requested (verses program text/data COW pages
    which do not need zero'd pages but still eat anonymous memory for the
    duration of the target program).

    When I tested it with a static and dynamic /bin/sh on 4.8 I got 
    (looking at 'freepg'), 20 pages for the static binary and 50 pages for
    the dynamic binary.  So a dynamic /bin/sh eats 30 * 4K = 120K more 
    anonymous memory then a static /bin/sh.  In the same test I got 
    12 ZFOD faults for the static binary and 34 ZFOD faults for the 
    dynamic binary, which means that 22 additional pre-zero'd pages are
    being allocated in the dynamic case (88KB).

    If /bin/sh is exec'd a lot in a situation where the system is otherwise
    not idle, this will impact the number of pre-zero'd pages available on
    the system.  Each exec of a dyanmic /bin/sh eats 22 additional pages
    (88K) worth of zero-fill.  Each resident copy of (exec'd) /bin/sh eats
    120KB more dirty anonymous memory.  make buildworld -j 1 may have as 
    many as a dozen /bin/sh's exec'd at any given moment (impact 120K each)
    depending on where in the build it is.  -j 2 and so forth will have
    even more.  This will impact your system relative to the amount of total
    system memory you have.  The more system memory you have, the less the
    percentage impact.

		/bin/sh			/bin/csh
		--------------		-----------------------
    static	freepg -19 zfod 12	freepg -140 zfod 129
    dynamic	freepg -50 zfod 34	freepg -167 zfod 149

		/usr/bin/make  (note that make is static by default)
    static	freepg -33 zfod 27
    dynamic	freepg -51 zfod 44

    As you can see, the issue becomes less significant on a percentage
    basis with larger programs that already allocate more incidental memory.
    Also to my surprise I found that 'make' was already static.  It would
    seem that this issue was recognized long ago.  bzip2, chflags, make,
    and objformat are compiled statically even though they reside in /usr/bin.


 * print delta free pages and zfod requests once a second.  Leave running
 * while testing other programs.  Note: ozfod is not displayed.  ozfod is
 * a subset of zfod, just as zfod deltas are a subset of v_free_count
 * allocations.

#include <sys/types.h>
#include <sys/sysctl.h>
#include <stdio.h>
#include <unistd.h>

main(int ac, char **av)
    int fc1;
    int zfod1;
    int fc2;
    int zfod2;
    size_t fclen;

    fclen = sizeof(fc1);
    sysctlbyname("vm.stats.vm.v_free_count", &fc1, &fclen, NULL, 0);
    fclen = sizeof(zfod1);
    sysctlbyname("vm.stats.vm.v_zfod", &zfod1, &fclen, NULL, 0);

    for (;;) {
	fclen = sizeof(fc1);
	sysctlbyname("vm.stats.vm.v_free_count", &fc2, &fclen, NULL, 0);
	fclen = sizeof(zfod2);
	sysctlbyname("vm.stats.vm.v_zfod", &zfod2, &fclen, NULL, 0);
	printf("freepg %-4d zfod %-4d\n",
	    fc2 - fc1,
	    zfod2 - zfod1);
	fc1 = fc2;
	zfod1 = zfod2;

More information about the freebsd-current mailing list