Point to Point Communication I

Lab Exercise
Prerequisites Overview Exercise Solution Cleanup

Prerequisites

This lab should be done after the talk MPI Point to Point Communication I. You should complete the MPI Basics exercise before starting this lab.  This means you should already be familiar with the contents of Compiling and Porting Applications 
and Tutorial on Cluster CoNTroller System: Running a Parallel Job.


To learn or reference the syntax of MPI calls, access the Message Passing Interface Standard at http://www-unix.mcs.anl.gov/mpi/ or see the MPI/Pro routines.


Overview

The first lab exercise will familiarize you with the use of blocking and non-blocking calls. In the second lab exercise, you will work on a simple code to improve data decomposition and communication patterns.


Exercise

Before You Begin

  1. Using either the Windows GUI or the command line, copy all lab files found in

    H:\VWlabs\MPI\Pt2pt\Lab\

    to a location in your home directory on H:, e.g.,

    mkdir H:\users\your_userid\Pt2pt
    copy H:\VWlabs\MPI\Pt2pt\Lab\* H:\Users\your_userid\Pt2pt

  2. Edit and compile your programs on one of the login nodes using either Microsoft Visual Studio or a simple text editor (like Notepad) plus compiler commands.

  3. Create or modify a batch file using your favorite editor Use the CCS requirements keyword to request development nodes for faster turnaround.

  4. Submit this .bat file with ccsubmit. Remember NOT to run your executables on the login nodes!


Instructions

C lab files: least-squares-pt2pt.c, xydata
Fortran lab files: least-squares-pt2pt.f, xydata

This exercise was designed to give you plenty of opportunities to program in parallel, especially in the areas of point-to-point communication and data decomposition. There are two possible ways to do this lab. You can treat the problems below independently by starting with a fresh copy of least-squares-pt2pt.c (or .f) each time. Alternatively, you can gradually produce one program that contains solutions to all the problems by starting each problem with the solution to the previous one.

  1. Read through the program and try to get a good understanding of the algorithm. Observe the use of blocking send and receive calls and how data are decomposed. A serial version of this program, least-squares.c or least-squares.f, is also available.

    • Compile the program and the sleep utility using Microsoft Visual Studio, or if you prefer, do this on the command line as follows:

      icl /c new_sleep.c

      followed by

      ifort /Qlowercase least-squares-pt2pt.f new_sleep.obj mpipro.lib mpipro_cdec.lib

      or

      icl least-squares-pt2pt.c new_sleep.obj mpipro.lib

    • Run the program with different numbers of processes (between 2 and 10). Notice how data points are distributed among the processes.

    Note: All the following programs should be compiled in the same way as program "least-squares-pt2pt.exe" , however the "new_sleep.obj" object should be created only once.

  2. To practice using non-blocking send and receive, replace the blocking send and receive calls in least-squares-pt2pt.c (or .f) with non-blocking calls.

    C solution file: pt2pt-nblk-comm.c
    Fortran solution file: pt2pt-nblk-comm.f

  3. The program needs some improvement in dealing with load balancing. This is because the last process is given the largest number of data points when these do not divide evenly by the number of processes.

    Rewrite Steps 2 and 3 of least-squares-pt2pt.c (or .f) to provide the most equitable distribution of data points among the processes. When there are excess data, some processes should have n/numprocs and some have (n/numprocs) + 1 data points.
    (Note: Assume integer division for n/numprocs.)

    C solution file: pt2pt-load-bal.c
    Fortran solution file: pt2pt-load-bal.f

  4. In Step 4 of least-squares-pt2pt.c (or .f), process 0 receives all the partial sums from the other processes. Rewrite Step 4 using a binary tree as described below:

    • Use a number of processes that equals to 2 to some power, for example, 2, 4, or 8.
    • Divide the processes into two groups. Then each process from the second group sends its partial sum to a process in the first group. The first group is then divided into two and the step is repeated until process 0 has all the partial sums, (assuming process 0 is in the first group). Click here for an illustration of this method using 8 processes.

      (Note: This can also be done with an MPI collective communication routine, MPI_Reduce.)

    C solution file: pt2pt-bi-reduc.c
    Fortran solution file: pt2pt-bi-reduc.f

  5. As an optional exercise, combine all the fixes into one program.

    C solution file: pt2pt-combo.c
    Fortran solution file: pt2pt-combo.f


Solution

See the source files named in the Instructions.


Cleanup

When you are done running programs, delete your subfolder on the T: drives. End your batch job with ccrm or ccrelease. You may also wish to delete any files you copied from the VWlabs folder into your space on H:.