PID Allocator Performance Results (was: Re: [UPDATE] new pid
alloc...)
David Schultz
das at FreeBSD.ORG
Sun Feb 8 01:45:48 PST 2004
On Sun, Feb 08, 2004, Poul-Henning Kamp wrote:
> In message <20040208080630.GA14364 at VARK.homeunix.com>, David Schultz writes:
> >I spent some time today benchmarking the various proposed pid
> >allocators. The full results, along with pretty pictures and a
> >more complete analysis, are at:
> >
> > http://people.freebsd.org/~das/pbench/pbench.html [1]
>
> You _do_ realize that the difference between "tjr" and "net" in the
> bottom plot is not statistically significant ?
>
> Stratification is visibly present from approx 1500 pids and up, and
> ends up being responsible for 1/3rd of the difference by the time
> you get to 5000 pids.
>
> (The tell-tale sign here is that the two data sets both fall on two
> mostly straight lines in a random looking pattern, with practically
> no measurements hitting the interval between the two lines.)
>
> If we assume the stratification has linearity with number of pids,
> which I think looks reasonable, and we read the right hand edge as
> half a second and the left hand edge as zero, we find:
>
> (.5 - 0) [second]
> -------------------------------------- = 10 [nsec] / [iteration*pid]
> 10000 [iterations] * (5000 - 0) [pids]
>
> 10nsec per operation is getting you into the territory of effective
> TSC-timecounter resolution, RAM access time, cache miss delays
> and all sorts of other hardware effects.
To avoid jitter and timestamping overhead, I read the time only at
the start and end of the entire sequence of 10000 operations.
I obtained the sample variance by running the entire test three
times, i.e.
for (pass = 0; pass < 3; pass++) {
for (nprocs = 100; nprocs < 5000; nprocs++) {
set up the test, fork the sleepers;
take starting timestamp;
for (iter = 0; iter < 10000; iter++)
run test;
take ending timestamp;
}
}
Nevertheless, you're definitely right about the stratification.
I'm not sure how to explain that. My best theory is that there's
some confounding factor, such as the pageout daemon waking up,
that has a constant overhead. If you look at the original
samples, there's always two that are normal and one outlier. More
samples would probably correct for this, and if I were running the
benchmark again, I would have done more iterations of the outer
loop in the pseudocode above and fewer of the inner loop.
> So all in all, I would say that you have proven that "tjr" and "net"
> are better than "old", but not that there is any statistically
> significant performance difference between them.
Yes, I realize that. I took 10 more samples of 10000 forks each
with 5000 sleeping processes in the background and got the
following:
tjr:
1.130558492
1.125901197
1.144079485
1.118981882
1.131435699
1.123052511
1.133321135
1.121301171
1.133015788
1.124377539
net:
1.116848091
1.119333603
1.117941526
1.121989527 (got an outlier (2.547301682) here, so I reran this test)
1.118023912
1.110658198
1.126021045
1.106436712
1.116406694
1.100889638
This data show a difference at the 95% confidence level, namely,
that the NetBSD algorithm is about 1% faster on a system with 5000
processes (and only 0.1% faster if you're looking at the total
overhead of fork() rather than vfork().) I think that pretty much
rules out performance as the deciding factor between the two.
More information about the freebsd-current
mailing list