|
Parallel programming is at least an order of magnitude more complex than its serial sibling (conceptually ... maybe not necessarily in practice, but you should start out expecting it to be and budget your time and energy appropriately), due not simply to the multiple processes that have to be considered but, more importantly, to the joint and interlocking control that has to be woven between and among them. Not only do you have the same kinds of problems plaguing you as in the serial situation (i.e., simple coding bugs and design semantics), you also have to contend with a whole new dimension of problems based solely on the mutual-interdependence that characterizes distributed processing.
- Deadlock
Each of the processes in figure hold one of the required objects, identified by the solid border, and need the other, identified by the dashed border. Deadlock is said to occur when the conditions for releasing the held object include acquisition of the desired one: if both processes must get the object they don't have before releasing the one that they do, then both will be left with what they started with, and will never obtain that which they need.
- Use of blocking communication calls and/or buffering can lead to an impasse where all processors are simultaneously waiting for completion of infeasible communication requests.
Exactly the same kind of thing can happen in distributed processing, most commonly when all processes engage in the same blocking form of communication:
- Synchronous sends called on all processors without preceding asynchronous receives
- Blocking (standard or synchronous) receives precede sends on all processors
In both of these cases, everyone has posted the same end of a communication link ... what makes this a case of deadlock is that the mode of all of them is blocking, so that none of them have the opportunity to turn around and perform the opposite side of the link, which would free up the other end to do the same.
When dealing with sets of resources that must be shared, such as the sending and receiving roles in communications, make sure to work out in advance how they will be allocated in such a way as to avoid the possibility of deadlock; for example, when assigning initial roles in a communications process, consider using some characteristic of the node-number (a usually small integer used to provide a unique, neutral identification for every participant in a distributed environment), say even/odd-ness: all even nodes do blocking-sends first to their odd counterparts, and then they switch roles. Other schemes can be constructed to fit the context of your own particular application and computing environment.
- Race
A race is any situation where multiple processes are vying for access to a shared resource in an uncontrolled fashion, very much like a number of diners all reaching for the last dinner roll at the same time: which one actually gets the roll is a race. This is not necessarily a bad situation in all circumstances, and may, in fact, be a desirable component of the design of the application (for example, where acquisition of the resource is a measure of the process' survivability, in an artificial-life context); however, in most cases, races are indicative of a potentially dangerous lack of control over execution, and are also not-uncommonly found associated with deadlocks when the designer simply assumed that a particular sequence of processes would queue for the resource, tailored the release of other resources based on that assumed sequence, and then found that the assumed sequence was violated in actual practice.
- Absence of explicit synchronization calls or inappropriate use of standard or non-blocking operations may lead to incorrect or undesirable nondeterministic results
As a race is characterized by uncontrolled access to shared resources, it follows that imposition of control, in the form of some type of synchronization structure, will reduce the race to an orderly march. Just because the processes engage in a race doesn't automatically mean that the application will freeze, or the computations will be in error, but the mere fact that any of the processes could obtain the resource does mean that there's an aspect of unpredictability in the execution, and that can have some rather nasty side-effects. For instance, one mainstay in scientific investigation is the concept of reproducibility: you'd like to be able to reproduce the same effects every time you run an experiment with the same initial conditions, inputs, and procedures. In parallel programming, every race situation adds just that much randomness to the operation, takes away just that much more reproducibility from the whole process.
Again, non-determinism is not necessarily "bad", in and of itself, but can contribute to the overall unwieldy-ness of the entire application, and certainly makes it harder to track down runtime problems associated with infrequent execution paths based on particular processes "winning" particular races.
The bottom-line, then, is to be very careful about where and how you let the pack off their leashes: make sure you do it by design, and that sufficient care has been taken to avoid deadlocks and to insure reproducibility where needed.
- Example: absence of synchronization at the beginning or end of a timestepping loop
A timestepping loop is a loop that simulates a timestep of a given duration during the execution of a model. When multiple processes engage in collaborative execution of a simulation and encounter a timestepping phase, it is very important for purposes of both efficiency and correctness that all processes start the loop at "roughly" the same time, and that all processes leave the loop at "roughly" the same time, where "roughly" is understood to take into consideration the distributed nature of the communication implementing the synchronization. Making sure that all processes have had ample opportunity to initialize themselves before beginning their cooperative execution, as well as finish their final iteration before beginning a post-loop phase, helps insure that message streams will remain managable.
- Insertion of message passing calls for debugging purposes can change dramatically change the behavior of incorrect programs
Have you ever had the interesting experience of having a buggy serial code suddenly start working when you add debugging print statements, and then stop or go back to giving bad results when those print statements were removed? In very analogous fashion, messages added to parallel code in order to track down execution-time problems can cause those codes to suddenly begin exhibiting strangely altered behavior. Not only have you added additional serial considerations (i.e., you've changed the code-size of the per-processor executable, which may have implications for storage that had previously been overwritten by poorly-controlled memory operations), you've also added new timing issues (construction, transmission, acceptance, unpacking, and receipt of message packets) as well as data compilation (i.e., whatever you're going to report) and storage modifications.
Bottom-line: if your code suddenly begins producing results after you've added debugging support (whether it be serial or parallel in nature), you've still got buggy code down there, and your debugging support may actually be hindering your attempts to find it. For message-passing systems, try to do as little additional communication as possible, because communication is such an expensive operation -- instead, use the local memory to save state during the run, and try to match up the individual images across all the processes afterwards.
|