remote operation or admin

Wed Mar 19 21:36:52 UTC 2008

On Wed, 19 Mar 2008 15:10:12 -0400
Chuck Robey <chuckr at chuckr.org> wrote:

> Not completely yet (I tend to be stubborn, if I carry this too far, tell me
> in private mail and I will politely drop it).  Your use cases show me the
> differences in size, and *because* of the size, the differences in how
> you'd use them, and that part I did already know.  I'm perfectly well aware
> of the current differences in size, but what I'm after is what are the real
> differences, ignoring size, in what they actually accomplish, and how they
> go about doing it.  I'm thinking of the possibility of perhaps finding it
> it might be possible to find some way to extend the work domain of an smp
> system to stretch across machine lines, to jump across motherboards.  Maybe
> not to be global (huge latencies scare me away), but what about just going
> 3 feet, on a very high speed bus, like maybe a private pci bus?  Not what
> is, what could be?

What you're describing is a classic multi-cpu system. From the
software point of view, it's just a another variant on the whole
multiprocessor thing. I believe most modern multi-cpu systems share
all the resources, and are all SMP. You could build one with memory
that wasn't equally accessible to all CPUs in the system, but in
general you don't want to do that unless the CPUs in question are
doing different things, such as a GPU, an I/O processor of some kind,
etc.

So the answer to your two central questions "what are the real
differences between what they actually accomplish and how they go
about doing it" - at least for tightly coupled, everything shared
multi-cpu boxes - is "there aren't any." There's been lots of work
done in these areas, dating back to the 60s. Multics, Tandem and V are
the ones that come to my mind, but there are others.

There are schools of concurrent software design that hold that this
should be true across the entire scale: that you design your system to
work the same way on a single box with a quad-core CPU as you would on
single-cpu boxes one each in Sao Paulo, Anchorage, Melbourne and
Trondheim, as the differences are "implementation details". In real
life, those difference are huge.

In particular, if you're staying inside the same room, the critical
one is going from "all memory shared" to "some memory shared", as the
former means you can have a single memory location representing an
external resource, and once you lock it you're set. With the latter,
you can't do that - you have to bounce messages back and forth between
the system, and get them all agree to let you have it before you work
on it. This can easily push timing things out to as bad as the next
level of sharing. For instance, I build ETL systems that shove
thousands of multi-megabyte files/hour between boxes. It's actually
faster to transfer the data over the network and let each box use an
unshared file system for them than to rename the files on a shared
disk from one directory to another, because doing the file rename
requires locking both directories on every system that has access to
them - which in this case means dozens of them.

    <mike
-- 
Mike Meyer <mwm at mired.org>		http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.