|
Parallel processing has its own lexicon of terms and phrases, emphasizing those concepts that are considered to be most important to its goals and the ways in which those goals may be achieved. The following are some of the more commonly encountered ones. They are listed in an order for you to learn them, assuming you do not know any, so you can start with the first and then build up to the rest.
Task
A logically discrete section of computational work.
This is a somewhat loose definition, but adequate for this introduction. For now, think of a task as computational work you can describe simply, such as, "calculate the mean and standard deviation of 100,000 numbers," or "calculate a Fast Fourier transform."
Parallel Tasks
Tasks whose computations are independent of each other, so that all such tasks can be performed simultaneously with correct results.
Serial Execution
Execution of a program sequentially, one statement at a time.
Parallelizable Problem
A problem that can be divided into parallel tasks. This may require changes in the code and/or the underlying algorithm.
Example of Parallelizable Problem:
Calculate the potential energy for each of several thousand independent conformations of a molecule; when done, find the minimum energy conformation
The above is an example of a problem that is amenable to parallelization; each of the conformations is independently determinable, while the calculation of the minimum such conformation is itself a parallelizable problem. Example of a Non-parallelizable Problem:
A non-parallelizable problem, such as the calculation of the Fibonacci sequence above, would entail dependent calculations rather than independent ones -- notice how calculation of the k + 2 value uses those of both k + 1 and k, hence those three terms cannot be calculated independently, nor, therefore, in parallel.
Types of Parallelism
There are two basic ways to partition computational work among parallel tasks:
- Data parallelism: each task performs the same series of calculations, but applies them to different data. For example, four processors can search census data looking for people above a certain income; each processor does the exact same operations, but works on different parts of the database.
You can read more on data parallel computing here.
- Functional parallelism: each task performs different calculations, i.e., carries out different functions of the overall problem. This can be on the same data or different data. For example, 5 processors can model an ecosystem, with each processor simulating a different level of the food chain (plants, herbivores, carnivores, scavengers, and decomposers).
Observed Speedup
Observed speedup of a code which has been parallelized = wall-clock time of serial execution
---------------------------------------
wall-clock time of parallel execution
One of the most widely used indicators of parallelizability, the calculation of observed speedup is both intuitively satisfying, and potentially misleading; the former because a well-parallelized code can be shown to run in a fraction of the time that it takes the serial version, the latter because, in many respects, this is a comparison of apples and oranges: the codes are different, they perform different tasks, the algorithms may be entirely distinct. Still, there is no discounting the fact that a good job of parallelization will be evident in the amount of wallclock time it has saved the user; what is debatable is the converse: if it is not evident that a lot of time has been saved, is it because the problem itself is not parallelizable, or because the parallelization simply wasn't done well? This, by the way, is where parallel profiling tools, covered later in this presentation, can help tremendously.
Synchronization
The temporal coordination of parallel tasks. It involves waiting until two or more tasks reach a specified point (a sync point) before continuing any of the tasks.
- Synchronization is needed to coordinate information exchange among tasks; e.g., the previous example finding minimum energy conformation: all of the conformations had to be completed before the minimum could be found, so any task that was dependent upon finding that minimum would have had to wait until it was found before continuing.
- Synchronization can consume wall-clock time because processor(s) sit idle waiting for tasks on other processors to complete.
- Synchronization can be a major factor in decreasing parallel speedup, because, as the previous point illustrates, the time spent waiting could have been spent in useful calculation, were synchronization not necessary.
Parallel Overhead
The amount of time required to coordinate parallel tasks, as opposed to doing useful work.
Parallelization doesn't come free, and one of the most insidious costs is the time and cycles put into making sure that all of those separate tasks are doing what they're supposed to be doing. Things that are simply taken for granted in serial execution, or that don't apply, take on special significance when there are many tasks instead of just one; the three most commonly encountered coordination tasks are:
- Time to start a task
This involves, among other things:
- identifying the task
- locating a processor to run it
- loading the task onto the processor
- putting whatever data the task needs onto the processor
- actually starting the task
- Time to terminate a task
Termination isn't a simple chore, either: at the very least, results have to be combined or transferred, and operating system resources have to be freed before the processor can be used for other tasks.
- Synchronization time, as previously explained.
Granularity
A measure of the ratio of the amount of computation done in a parallel task to the amount of communication.

- Scale of granularity ranges from fine-grained (very little computation per communication-byte) to coarse-grained (extensive computation per communication-byte).
- The finer the granularity, the greater the limitation on speedup, due to the amount of synchronization needed. Remember from the very beginning of this module about considering how hard it would be to coordinate the activities of 52 people, all trying to help sort a single deck of cards?
Massively parallel system
A parallel system with many processors. "Many" is usually defined as 1000 or more processors.
Scalable parallel system
A parallel system to which the addition of more processors will yield a proportionate increase in parallel speedup. Whether or not this increase occurs typically depends on some combination of:
- Hardware
One of the most significant bottlenecks to scalability lies in the capability of the communications network -- adding more processors to a network that is already filled to capacity will not yield the expected increase in speedup, and could in fact do just the reverse.
- Parallel Algorithm
Some algorithms are better suited to parallelism (i.e., more scalable) than others, and there are even economies of scale within parallel ones, i.e., algorithms that work well at certain scales of parallelism work poorly at higher scales, and vice versa. Generally speaking, scalable implies O(n) growth ("on the order of n"; i.e., linear) with data-size n, and preferably the growth should be O(log n).
It should be emphasized here that great care should be taken to match the algorithm with the actual problem, and both of these with the actual size of the machine on which the problem will be attacked using that particular algorithm. A particular algorithm may appear to work quite nicely on the given problem when run on a small number of nodes, or on a part of the problem over the entire target machine, but when all three production versions are brought together, you may find that your initial tests were in fact misleading, and what worked well at a small scale works not at all at the desired scale.
- Your actual code
Even if you use an algorithm well suited to your purposes, the way you implement it can determine how much parallelism is actually expressed in your application. Easy ways to negate the usefulness of a good algorithm include using unsuitable data objects, or failing to take into consideration how your programming language structures things like multi-dimensional arrays in order to maximize the efficiency of addressing natural blocks of data.
|