Improved multiprocessor usage on amd64
Stephen Montgomery-Smith
stephen at math.missouri.edu
Tue Sep 16 03:48:56 UTC 2008
Stephen Montgomery-Smith wrote:
> Steve Kargl wrote:
>> On Mon, Sep 15, 2008 at 07:36:04PM -0500, Stephen Montgomery-Smith wrote:
>>> ... and each thread is a loop of the form
>>>
>>> while (1) {
>>> wait until told to start;
>>> do massive amounts of floating point arithmetic (only additions and
>>> multiplications) on large arrays;
>>> tell the master process that you are done;
>>> }
>>>
>>>> Do you have about as many threads as processor or more?
>>> Both ways. The time difference between the two approaches is
>>> negligible.
>>>
>>
>> Are you using ULE? With my MPI applications, if the number of
>> launched processes exceeds the number of cpus by 1, ULE falls
>> through the floor. I have a nagging feeling that there is a problem
>> with cpu affinity.
>>
>> http://lists.freebsd.org/pipermail/freebsd-current/2008-July/086917.html
>>
Let me say a little bit more.
I have this gut feeling that the problem has a lot to do with cache
management. My program has each thread doing, in effect, huge matrix
multiplications, each one working on their own little bit. If a CPU
core changes from one thread to another, it then has to flush out the
cache to RAM, and read in a whole bunch of other RAM into cache.
I have this sense that Linux and FreeBSD have something in its internals
where it figures this out, and after a while starts changing the time
between when it changes from one process to another. But Linux has a
faster learning curve than FreeBSD.
But this is all pure speculation on my part, because I have very little
ideas as to how these internals work.
Stephen
More information about the freebsd-current
mailing list