Collective Communication I

2.4 Global Computing
  • Communication routines that include a computation

  • Computation function included in routine call may be either
  • An MPI predefined routine or
  • A user-supplied function

  • Two types of global computation routines:
    • Reduce
    • Scan
  • Reduce

    Purpose:
    • Combine the elements of the input buffer of each process using a specified operation
    • Return result to root process
    Fortran Binding:
    MPI_REDUCE (sbuf, rbuf, count, stype, op, root, comm, ierr)
    C Binding:
    int MPI_Reduce ( void* sbuf, void* rbuf, int count, MPI_Datatype stype, MPI_Op op, int root, MPI_Comm comm)


    Scan

    Purpose:
    • MPI_SCAN performs a partial reduction in which process i receives data from processes 0 through i, inclusive.
    C
    int MPI_Scan (void* sbuf, void* rbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)
    FORTRAN
    MPI_SCAN (sbuf, rbuf, count, datatype, op, comm, ierr)
    sbuf starting address of send buffer,
    rbuf starting address of receive buffer,
    count number of elements in input buffer,
    datatype data type of elements of input buffer,
    op operation,
    comm group communicator.
    Note that a segmented scan can be done by creating a subgroup for each segment.

    MPI Predefined Reduce Operations
    Name Meaning Datatypes
    MPI_MAX maximum value Each of these
    operations
    makes sense
    for only certain
    datatypes.

    The MPI Standard
    lists the types
    accepted for
    each operation.

    MPI_MIN minimum value
    MPI_SUM sum
    MPI_PROD product
    MPI_LAND logical and
    MPI_BAND bit-wise and
    MPI_LOR logical or
    MPI_BOR bit-wise or
    MPI_LXOR logical xor
    MPI_BXOR bit-wise xor
    MPI_MAXLOC max value and location
    MPI_MINLOC min value and location

    User-defined Operations

    • User can define his/her own reduce operation
    • Makes use of the MPI_OP_CREATE function
    Reduce Variations
    • MPI_ALLREDUCE allows the result to appear in the receive buffers of all processes in the group.
    • MPI_REDUCE_SCATTER scatters a vector, which results from a reduce operation, across all processes.

    There are two types of global computation routines: reduce and scan. The operation function passed to a global computation routine is either a predefined MPI function or a user supplied function. MPI provides four global computation routines. It is possible for users to supply their own functions. In this section, we first discuss reduction computation. Then we will discuss scan computation.

    MPI_REDUCE

    One of the most useful collective operations is a global reduction or combine operation. The partial result in each process in the group is combined in one specified process or all the processes using some desired function. If there are n processes in the process group, and D(i,j) is the jth data item in process i, then the Dj item of data in the root process resulting from a reduce routine evaluation is given by:

    Dj = D(0,j)*D(1,j)* ... *D(n-1,j)

    where * is the reduction function and it is always assumed associative. All MPI predefined functions are also assumed to be commutative. One may define functions that are assumed to be associative, but not commutative. Each process can provide either one element or a sequence of elements. In both cases the combine operation is executed element-wise on each element of the sequence. There are three versions of reduce. They are MPI_REDUCE, MPI_ALLREDUCE, and MPI_REDUCE_SCATTER. The form of these reduction primitives is listed below:

    C
    int MPI_Reduce (void* sbuf, void* rbuf, int count, MPI_Datatype stype, MPI_Op op, int root, MPI_Comm comm)
    int MPI_Allreduce (void* sbuf, void* rbuf, int count, MPI_Datatype stype, MPI_Op op, MPI_Comm comm)
    int MPI_Reduce_scatter (void* sbuf, void* rbuf, int* rcounts, MPI_Datatype stype, MPI_Op op, MPI_Comm comm)
    FORTRAN
    MPI_REDUCE (sbuf, rbuf, count, stype, op, root, comm, ierr)
    MPI_ALLREDUCE (sbuf, rbuf, count, stype, op, comm, ierr)
    MPI_REDUCE_SCATTER (sbuf, rbuf, rcounts, stype, op, comm, ierr)

    The differences among these three reduces:

    • MPI_REDUCE returns results to a single process;
    • MPI_ALLREDUCE returns results to all processes in the group;
    • MPI_REDUCE_SCATTER scatters a vector, which results from a reduce operation, across all processes.
    sbuf is the address of send buffer,
    rbuf is the address of receive buffer,
    count is the number of elements in send buffer,
    stype is the data type of elements of send buffer,
    op is the reduce operation (which may be MPI predefined, or your own),
    root is the rank of the root process,
    comm is the group communicator.
    Notes:
    * rbuf significant only at the root process,
    * the argument rcounts in MPI_REDUCE_SCATTER is an array.

    Scan

    A scan or prefix-reduction operation performs partial reductions on distributed data. Particularly, let n be the size of the process group, D(k,j) the jth data item in process k after returning from scan, and d(k,j) the jth data item in process k before the scan. Let k =0, 1, ..., n-1. A scan returns

    D(k,j) = d(0,j) * d(1,j) * ... *d(k,j)

    where * is the reduction function which may be either an MPI predefined function or a user defined function.

    The syntax of the MPI scan routine is

    C
    int MPI_Scan (void* sbuf, void* rbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)
    FORTRAN
    MPI_SCAN (sbuf, rbuf, count, datatype, op, comm, ierr)
    sbuf starting address of send buffer,
    rbuf starting address of receive buffer,
    count number of elements in input buffer,
    datatype data type of elements of input buffer,
    op operation,
    comm group communicator.
    Note that a segmented scan can be done by creating a subgroup for each segment.

    Predefined Reduce Operations

    Examples of MPI predefined operations are summarized in Table 1. MPI also provides a mechanism for user-defined operations used in MPI_ALLREDUCE and MPI_REDUCE.

    MPI Predefined Reduce Operations
    Name Meaning C type FORTRAN type
    MPI_MAX maximum integer, float integer, real, complex
    MPI_MIN minimum integer, float integer, real, complex
    MPI_SUM sum integer, float integer, real, complex
    MPI_PROD product integer, float integer, real, complex
    MPI_LAND logical and integer logical
    MPI_BAND bit-wise and integer, MPI_BYTE integer, MPI_BYTE
    MPI_LOR logical or integer logical
    MPI_BOR bit-wise or integer, MPI_BYTE integer, MPI_BYTE
    MPI_LXOR logical xor integer logical
    MPI_BXOR bit-wise xor integer, MPI_BYTE integer, MPI_BYTE
    MPI_MAXLOC max value and location combination of int, float, double, and long double combination of integer, real, complex, double precision
    MPI_MINLOC min value and location combination of int, float, double, and long double combination of integer, real, complex, double precision

    Example

    • Each process in a forest dynamics simulation calculates the maximum tree height for its region.
    • Process 0, which is writing output, must know the global maximum height.
      INTEGER maxht, globmx
          .
          .
          .  (calculations which determine maximum height)
          .
          .
      call MPI_REDUCE (maxht, globmx, 1, MPI_INTEGER, MPI_MAX, 0, MPI_COMM_WORLD, ierr)
      IF (taskid.eq.0) then
          .
          .  (Write output)
          .
      END IF
    
    or
      int maxht, globmx;
          .
          .
          .  (calculations which determine maximum height)
          .
      MPI_REDUCE(maxht, globmx, 1, MPI_INTEGER, MPI_MAX,0,MPI_COMM_WORLD);
      if(taskid==0)
      {
          .
          .  (write output)
          .
      }
    

    User-defined Operations

    • User can define his/her own reduce operation
    • Makes use of the MPI_OP_CREATE function