SCHED_ULE should not be the default

Jeremy Chadwick freebsd at jdc.parodius.com
Thu Dec 15 17:49:00 UTC 2011


On Thu, Dec 15, 2011 at 05:26:27PM +0100, Attilio Rao wrote:
> 2011/12/13 Jeremy Chadwick <freebsd at jdc.parodius.com>:
> > On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote:
> >> > Not fully right, boinc defaults to run on idprio 31 so this isn't an
> >> > issue. And yes, there are cases where SCHED_ULE shows much better
> >> > performance then SCHED_4BSD. ??[...]
> >>
> >> Do we have any proof at hand for such cases where SCHED_ULE performs
> >> much better than SCHED_4BSD? Whenever the subject comes up, it is
> >> mentioned, that SCHED_ULE has better performance on boxes with a ncpu >
> >> 2. But in the end I see here contradictionary statements. People
> >> complain about poor performance (especially in scientific environments),
> >> and other give contra not being the case.
> >>
> >> Within our department, we developed a highly scalable code for planetary
> >> science purposes on imagery. It utilizes present GPUs via OpenCL if
> >> present. Otherwise it grabs as many cores as it can.
> >> By the end of this year I'll get a new desktop box based on Intels new
> >> Sandy Bridge-E architecture with plenty of memory. If the colleague who
> >> developed the code is willing performing some benchmarks on the same
> >> hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
> >> recent Suse. For FreeBSD I intent also to look for performance with both
> >> different schedulers available.
> >
> > This is in no way shape or form the same kind of benchmark as what
> > you're planning to do, but I thought I'd throw it out there for folks to
> > take in as they see fit.
> >
> > I know folks were focused mainly on buildworld.
> >
> > I personally would find it interesting if someone with a higher-end
> > system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the
> > same test (changing -jX to -j{numofcores} of course).
> >
> > --
> > | Jeremy Chadwick ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??jdc at parodius.com |
> > | Parodius Networking ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? http://www.parodius.com/ |
> > | UNIX Systems Administrator ?? ?? ?? ?? ?? ?? ?? ?? ?? Mountain View, CA, US |
> > | Making life hard for others since 1977. ?? ?? ?? ?? ?? ?? ?? PGP 4BD6C0CB |
> >
> >
> > sched_ule
> > ===========
> > - time make -j2 buildworld
> > ??1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w
> > - time make -j2 buildkernel
> > ??640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w
> >
> >
> > sched_4bsd
> > ============
> > - time make -j2 buildworld
> > ??1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w
> > - time make -j2 buildkernel
> > ??638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w
> >
> >
> > software
> > ==========
> > * sched_ule test: ??FreeBSD 8.2-STABLE, Thu Dec ??1 04:37:29 PST 2011
> > * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011
> 
> Hi Jeremy,
> thanks for the time you spent on this.
> 
> However, I wanted to ask/let you note 3 things:
> 1) Did you use 2 different code base for the test? (one updated on
> December 1 and another one on December 12)

No; src-all (/usr/src on this system) was not updated between December
1st and December 12th PST.  I do believe I updated it today (15th PST).
I can/will obviously hold off so that we have a consistent code base for
comparing numbers between schedulers during buildworld and/or
buildkernel.

> 2) Please note that you should have repeated this test several times
> (basically until you don't get a standard deviation which is
> acceptable with ministat) and report the ministat output

This is the first time I have heard of ministat(1).  I'm pretty sure I
see what it's for and how it applies to this situation, but boy that man
page could use some clarification (I have 3 people looking at this thing
right now trying to figure out what means what in the graph :-) ).
Anyway, graph or not, I see the point.

Regarding multiple tests: yup, you're absolutely right, the only way to
do it would be to run a sequence of tests repeatedly (probably 10 per
scheduler).  Reboots and rm -fr /usr/obj/* would be required after each
test too, to guarantee empty kernel caches (of all types) consistently
every time.

What I posted was supposed to give people just a "general idea" if there
was any gigantic difference between the two, and there really isn't.
But, as others have stated (and you below), buildworld may not be an
effective way to "benchmark" what we're trying to test.

Hence me wondering exactly what would make for a good test.  Example:

1. Run + background some program that "beats on things" (I really don't
know what; creation/deletion of threads?  CPU benchmark?  bonnie++?),
with output going to /dev/null.
2. Run + background "time make -j2 buildworld" with output going to /dev/null
3. Record/save output from "time".
4. rm -fr /usr/obj && shutdown -r now
5. Repeat all steps ~10 times
6. Adjust kernel configuration file to use other scheduler
7. Repeat steps 1-5.

What I'm trying to figure out is what #1 and #2 should be in the above
example.

> 3) The difference is less than 2% which I suspect is really
> statistically unuseful/the same

Understood.

> I'm not really even surprised ULE is not faster than 4BSD in this case
> because usually buildworld/buildkernel tests are driven for the vast
> majority by I/O overhead rather than scheduler capacity. It would be
> more interesting to analyze how buildworld does while another type of
> workload is going on.

Yup, agreed/understood, hence me trying to find out what would classify
as a good stress test for all of this.

I have a testbed system in my garage which I could set up to literally
do all of this in a loop, meaning automate the entire above process and
just let it go, writing stderr from time to a file (which wouldn't skew
the results at all).

Let me know what #1 and #2 above, re: "the workloads", should be and
I'll be happy to set it up.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |



More information about the freebsd-current mailing list