There are two popular parallel programming models:
- Master/Worker where one process (master) is responsible for assigning tasks to to the remaining processes (workers) and coordinating results
- Single Program, Multiple Data (SPMD) where each process executes the same program using a different section of the data
The Parallel Program Design talk discusses and compares SPMD with Master/Worker in far greater detail than we do here.
The master/worker model
- Pros
- This model can be an effective way of dealing with tasks whose completion times vary widely. This typically happens when data are unevenly distributed, so that one section of the data may be quite sparse and require few calculations while another section may be dense and require many more calculations. The master process assigns an initial task to each worker. As each worker completes its task, it reports back to the master and requests a new task. Some workers may be able to complete two or more tasks in the time it takes workers with denser data sets to complete their initial tasks. All processors can thus be kept busy, idle processor time reduced, and the wall-clock time needed for the job to run to completion kept to a minimum.
- This model can be helpful if the results of each task must be evaluated and a complicated decision, such as whether to continue or halt a simulation, must be made. All decisions can be made centrally by the master process. The code executed by each worker can be simpler and not include the decision-making sections.
- This model applies equally welll to multithreaded programs running on single SMP nodes.
- Cons
- As the number of worker processes increases, the time needed to evaluate and coordinate results also increases. A bottleneck may develop in which workers are idle while waiting for new assignments or other responses from the master.
The SPMD model
- Pros
- This model is a natural choice for data parallel applications. Data parallel applications are those that can be parallelized by dividing the data into chunks and assigning each parallel process one chunk of data to work on. This type of application is quite common. The SPMD model has each parallel process run the same program on its own chunk of data. Results may then be collected by a single process for further, serial analysis or output.
- Embarrassingly parallel or loosely-coupled SPMD applications can benefit from processor affinity within an SMP node.
- Cons
- Some applications have a load-balancing problem. As noted above in the "pros" section for the master/worker model, this problem typically arises when data are clumped so that some processes can finish their calculations quickly, while other processes have dense arrays of data and take longer to compute. This can result in large amounts of idle time while the faster processes wait for the slower ones to finish. The SPMD model may not be able to solve this problem.