|
Distributed memory systems come in many different shapes and sizes, and enough color schemes to make paint dealers envious. They also differ in ways that interior decorators might not appreciate, but parallel applications designers and implementors certainly would, have similarities that none of their differences can affect, and share potentials that make them extremely attractive computational alternatives both economically and from the standpoint of software availability.
- Interconnection network topology
The exact manner in which the components of a distributed memory architecture are linked, i.e., the communications or interconnection network, can take many different forms, from essentially random (e.g., a PVM cluster comprised of many different machines scattered across the Internet, communicating via TCP/IP sockets) to very tightly constrained (e.g., the KSR-1 ring-of-rings), and even some that are highly reconfigurable (e.g., the Edinburgh supercomputing platform has a software-controlled, switchable routing network that can reconfigure the interconnect among the nodes in such a way that one researcher can be running a 2-D mesh, another can be using a hypercube configuration, and a third is executing on 1-D pipeline ... all at the same time within different partitions of the entire resource, or consecutively over the same set of compute-nodes).
This feature, far from being a liability, allows researchers to investigate the relationship between communication-network-topology and algorithmic efficiency, and to arrange one or the other in order to make the best of whatever restrictions might be imposed; for example, knowing that the application will be run on a 2-D mesh network, a set of collective-communications algorithms appropriate for that topology can be linked in, thus allowing the application to use the most efficient communication strategy possible.
- Non-uniform memory access times (NUMA)
Distributed memory always directly implies non-uniform memory access, regardless of the type of processor or the characteristics of the communications network. The most the applications designers can do is to keep data-locality in mind, and try to arrange for data to be as "close" to the processor that's going to be using it as possible, to the ideal of actually having it in local storage by the time it's called for. This is an "ideal", and certainly cannot be accomplished in general, but, by keeping this goal in mind, designers can take advantage of foreknowledge whenever possible, and thereby achieve very satisfying increases in runtime efficiency.
- Opportunities for exploiting commodity processor technologies
"Off-the-shelf" processors are achieving very respectable performance: what's powering your PC or workstation right now rivals in power what was considered almost "world-class" mainframe capability no more than 5 years ago -- linking 500 of them together far exceeds the fastest single "supercomputer" of that time, and it wouldn't take too many more to better, in peak-performance numbers, the best in the world today.
Even more exciting is the ability this provides for playing mix-and-match: some current processors are dynamite database engines, while others excel at number-crunching, and still others at graphical processing, etc., etc. -- the distributed memory architectural model allows one to essentially construct, from the ground up, the type of combined computational resource best suited for a particular application, assigning software modules to those processors most capable of performing the required operations.
Being able to exploit commodity processors has an implied benefit in terms of the single most costly element of the computational environment: software. Commodity processors are much more likely to be able to run commodity software than systems built from proprietary components.
|