HPC: Using Message Passing to distribute threads

Mon Feb 13 21:28:54 PST 2006

Forgive me if I am suggesting that we reinvent the wheel, but I have a
problem with a potentially simple solution.

It concerns the difficulty of adapting an application to use a parallel
computing system, such as with MPI or PVM.

I would like help possible to write a simple (heh heh) compiler
directive, or header, or a wrapper function which allows one to add a
tag or wrap a function call to a function which will be called
iteratively to spawn not just a new thread, but a new thread ***which
can be passed to another node*** in a parallel computer system?

This seems like a very simple and elegant method by which
non-parallelised code can be adapted to a parallel architecture.

My C, and my understanding of threading is very limited, and I've never
written  any kernel code. However, I will try and give an example:

The adaption process would simply become

o   #include <pvmwrap.h> /* Add support for a parallel computing thread
call */
o   locate higher-level functions which are computationally intensive
and will be called iteratively;
o   replace the raw function call with a pvmwrapped call.

Eg.,
/*A module to calculate n! for the first 1000 numbers*/

 int number;
 long double number_factorial;
 long double factorial (int number) {.....}
.....
 scanf("%d",number);
 for (i=0;i==number;i++,number_factorial=(factorial(number))) {
   printf ("%d factorial = %d",number_factorial;
 }
....

would become 

#include <pvmwrap.h>

 int i,number;
 long double number_factorial;
 long double factorial (int number) {.....}
.....
 scanf("%d",number);
 for (i=0;i==number;i++,pvmwrap(number_factorial=(factorial(number)))) {
   printf ("%d factorial = %d",pvmwrap(factorial(number);
 }
....

pvmwrap would have the necessary calls via the message passing protocol
to create the thread on the next available node, rather than on the
local system, and return the result to the caller. pvmwrap will need to
perform type identification of the variables, or targets of pointers,
and then declare these on the executing node first, to permit execution
without having to specifically code these declarations as parallelised
(which would greatly complicate the adaption to parallelism). The few
cycles used to perform these type identifications each iteration are
negligible compared with those of the wrapped function itself.

What say ye?

Damien Miller
===================================
                 Sub POSIX lumen
           orac000 at internet-mail.org
                 +61 422 921 498
   au.geocities.com/orac000000/bsd.html
===================================
-- 
  Aluminium Oxide
  orac000 at internet-mail.org

-- 
http://www.fastmail.fm - Choose from over 50 domains or use your own