Why have communicators?
- If you are writing a library, can you choose safe tags?
- Example: variable (and possibly incorrect) behavior if no communicator
We will now explain in general about communicators. This topic is covered in much greater depth in the module MPI Groups and Communicator Management.
A message's eligibility to be picked up by a specific receive call depends on its tag, source and communicator. Tag allows the program to distinguish between types of messages. Source simplifies programming. Instead of having a unique tag for each message, each process sending the same information can use the same tag. But why is a communicator needed?
An example
Suppose you are sending messages between your processes, but you are also calling a set of libraries you obtained elsewhere, which also runs on multiple nodes and communicates within itself using MPI. In this case, you want to make sure that messages you send go to your processes, and do not get confused with the messages being sent internally between the processes that comprise the library routine. This is when having communicators becomes important; they allow you to distinguish your program MPI calls and the library MPI calls.
In this example, we have three processes communicating with each other. Each process also calls a library routine, and the three parallel parts of the library routine communicate with each other. We want to have two different message "spaces", one for our messages, and one for the library's messages. We do not want any intermingling of the messages.
The boxes represent parts of three parallel processes. Time progresses from the top to the bottom of each diagram. The numbers in parentheses are NOT parameters, but rather process numbers. For example, send(1) means send a message to process 1. Recv(any) means receive a message from any processor. The user's (caller's) code is in the white (unshaded) boxes. The shaded boxes (callee) represent a (parallel) library package being called by the user. Finally, the arrows represent the movement of a message from sender to receiver.
The diagram below shows what we would like to happen. In this case, everything works as intended.

However, there is no guarantee that things will occur in this order, since the relative scheduling of processes on different nodes can vary from run to run. Suppose we change the third process by adding some computation at the beginning. The sequence of events might then occur as follows:

In this case, communications do not occur as intended. The first "receive" in process 0 now receives the "send" from the library routine in process 1, not the intended (and now delayed) "send" from process 2. As a result, all three processes hang.
This problem is solved by the library developer requesting a new and unique communicator, and specifying this communicator in all send and receive calls made by the library. This creates a library ("callee") message space separate from the user's ("caller") message space.
Can tags be used to accomplish separate message spaces? The problem with tags is that they are given values by the programmer, and he/she might use the same tag used by a parallel library using MPI. With communicators, the system, not the programmer, assigns identification -- the system assigns a communicator to the user, and it assigns a different communicator to the library -- so there is no possibility of overlap.