In the diagram below: S = sender, R = receiver, time increases left to right
In the diagram below, time increases from left to right. The heavy horizontal line marked S represents execution time of the sending task (on one node), and the heavy dashed line marked R represents execution time of the receiving task (on a second node). Breaks in these lines represent interruptions due to the message-passing event.

Click here for Flash animation
Message transfer must be preceded by a sender-receiver "handshake"
When the blocking synchronous send MPI_Ssend (S) is executed, the sending task sends the receiving task a "ready to send" message. (Here and elsewhere, click on the capital S for Syntax.) When the receiver executes the receive call, it sends a "ready to receive" message. The data are then transferred.
Two sources of overhead:
- System overhead -- time spent transferring buffer contents
- Synchronization overhead -- time spent waiting for another task
There are two sources of overhead in message-passing. System overhead is incurred from transferring the message data from the sender's message buffer onto the network, and from transferring the message data from the network into the receiver's message buffer.
Synchronization overhead is the time spent waiting for an event to occur on another task. In the figure above, the sender must wait for the receive to be executed and for the handshake to arrive before the message can be transferred. The receiver also incurs some synchronization overhead in waiting for the handshake to complete. Synchronization overhead can be significant, not surprisingly, in synchronous mode. As we shall see, the other modes try different strategies for reducing this overhead.
Either sender or receiver may have to wait; occasional delays due to system services throw off ideal synchronization
Only one relative timing for the MPI_Ssend (S) and MPI_Recv (S) calls is shown, but they can come in either order. If the receive call precedes the send, most of the synchronization overhead will be incurred by the receiver.
One might hope that, if workload is properly load balanced, synchronization overhead would be minimal on both the sending and receiving task. This is not always realistic. If nothing else causes lack of synchronization, system services which run at unpredictable times on the various nodes will cause unsynchronized delays. One might respond to this by saying that it would be simpler to just call MPI_Barrier frequently to keep the tasks in sync, but that call itself incurs synchronization overhead and doesn't assure that the tasks will be in sync a few seconds later. Thus, barrier calls are almost always a waste of time. (MPI_Barrier blocks the caller until all group members have called it.)