Parallel Programming Concepts

6.1 Automatic v. Manual

For years, ever since the first parallel system was constructed, the parallelization of existing codes has been largely the realm of manual conversion. The ultimate future goal of parallel support is to build tools capable of accepting the before-mentioned dusty-deck serial code, and returning a perfectly parallelized program suitable for execution on a particular state-of-the-art parallel system.

  • Automatic parallelization

    SURPRISE!!! We're not there yet ... but we're a lot closer now than we were just a few years ago, and the effort is gaining a lot of momentum. A number of significant factors affecting parallelization and effective speedup have been identified and automated in new compilers,

    • Many compilers attempt to do some automatic parallelization, especially of DO-loops

      As will be shown in more detail later, DO-loops are natural candidates for parallelization ... just seeing one doesn't guarantee that you'll be able to parallelize, obviously, but they're a clear marker of where to start looking. Modern parallel compilers immediately examine DO-loops for the appropriate characteristics, basically independence of one type or another, and do whatever they can to take advantage of what is often called "natural parallelism" exhibited by the code.

    • Significant speedups usually require going beyond automatic level

      But, just as indicated, things aren't as automatic yet as we'd like them to be. There are still too many interlocking factors, too complex to be captured in code, that require even the best automatic tools to be used in conjunction with well-trained conversion experts. At the very minimum, it is usually necessary to alter the code so as to expose parallelism to the compiler.

  • Manual parallelization

    The bulk of parallelization is still the realm of the human programmer, and this necessary resource cannot be emphasized too strongly:

    • Programmer must spend time to parallelize

      Expect this to be a time-consuming process, and budget for it ...if you expect to simply have to change a few lines, then you've likely got a nasty shock in store.

    • Some possible actions ("Rules of thumb")

      Over the years, a body of hard-learned wisdom has grown regarding how one can most efficiently extract parallelism from serial seeds; here are a few:

      • Remove inhibitors to parallelization

        Unnecessary serialization, for example, making all processes wait while one of them does something that could have been put off until a later required serial section.

        Re-arranging of loop-indices, to minimize inter-loop dependency.

      • Insert constructs or calls to library routines

        Some packages of often-used algorithms, for example, linear algebra routines, have already been parallelized. If your application has a need for such tasks, use someone else's work if at all possible.

        Constructs are commented-out keywords that are intended to be read by pre-processors and code-generators, allowing the programmer to indicate what kinds of parallelism should be attempted in certain parts of the program. Sometimes directives, which insist that things be done a certain way, and sometimes suggestions, indicating potentially useful directions, constructs often make it possible for the programmer to leave a section of serial code completely alone and still have it parallelized because of the actions taken upon recognition of the commented-out information.

      • Run code through preprocessors

        When parallel constructs, such as explained above, are used, it is typically possible to have the resulting parallel code displayed rather than simply compiled. Being able to look at what has been done by automatic means, as reflected in the modified source code, gives the programmer the opportunity both to learn how parallelism can be implemented, and to check a particular case against human knowledge and intuition.

      • Restructure algorithm

        Always be willing to re-examine the design upon which your application is based -- sometimes a simple change, such as moving a calculation outside of a loop, can have dramatic effects on parallelization.

    • Software tools that are available to assist at CTC
      • ClusterCoNTroller Batch System ;
      • MPI/Pro software;
      • C/C++ and Fortran Compilers;
      • Cornell Multitask Toolbox for Matlab;
      • Other parallel libraries;

      As indicated earlier, there is a small but growing number of software tools focused on providing assistance in the parallelization effort. These are language-dependent (mostly oriented towards Fortran), and still limited in their scope, but they can be very useful when appropriately applied. Here, "Parallel Compiler" means a compiler that understands parallel constructs and/or automatically extracts parallelism.

      More information on these tools can be found in later tutorials.