MPI

Q: Error message MPI/Pro: [0] : Communicator MPI_COMM_WORLD: 'Invalid communication medium (possible invalid value of MPI_COMM)'. (Abort)
A: Make sure to "set MPI_COMM=TCP".  

Q: A machine on which he is working does not have vipl.dll. How can he get it.
A: He can copy H:\users\sverdlik\vipl.dll.  

Q: Gethostbyname error, even though TCP job.
A: Set MPI_COMM=TCP. Do not depend on a default value.     

Q: MPIPro.NET.MPIRunClient: Rank 5 terminated on server: CTCDEV01 MPI/Pro Error: Exit code from rank 5 is 3221225477. More information: Access Violation.
A: This was a user error. Typically it occurs in an MPI program when you go beyond the bound of an array on an MPI call.  

Q: MPI/Pro Error: Failed to logon the user on server: ctc106 System Error: Logon failure: unknown user name or bad password More information: System error code: 1326 MPI/Pro Error Code: MPIPRO-104
A: Suggested that user needed to issue the mpipasswd command.  

Q: Do we have vipl.h on any login node?
A: No. vipl.h is in the source code distribution for mpich 1.2.5.  

Q: Built code with mpich.  Then fails to run on CTC machines.
A: The CTC Windows machines run MPI/Pro. Need to build and run same version of MPI.

Q: Error keeps recurring after a certain number of iterations on a parallel job.  What to do?
A: Save restart file after fewer iterations and using that for next mpirun in the same batch job.  

Q: Settings within VS for mpi disappeared.
A: If a new system image has been installed on a login node, the previous setting for VS are not maintained. 

Q: .mpipass not found.
A: Check the settings for HOME and HOMEPATH.  

Q: MPI/Pro Error: Process creation failed on server: VII0078 System Error: The system cannot find the path specified More information: Rank 4: Process creation failed for …
A: User does not have the statement del /Q T:\%USERNAME% in the setup.bat file. As a consequence, if there is a file T:\%USERNAME%, user cannot create a directory T:\%USERNAME%.  

Q: Parallel job stalled near end.
A: Try inserting MPI_Barrier.  

Q: Job ran on dev nodes, failed on cbsu, invalid value for MPI_COMM.
A: Set MPI_COMM to TCP rather than VIA.

Q: Is there a way to debug skeleton which is invoked by mpirun?
A: You can obtain more diagnostics by adding the following flags to mpirun, just after the mpirun command, and capturing stdout and stderr to files. I can look at the output. -verbose -mpi_debug -mpi_verbose  

Q: MPI/Pro Error: Process creation failed on server: CMI034 System Error: The system cannot find the path specified More information: Rank 0: Process creation failed for …
A: When this message appeared for all processors, it could be a typo in the directory name.  

Q: MPI/Pro Error: Failed to open the machines file System Error: Could not find file  C:\progra~1\mpipro\bin\machines. MPI/Pro Error Code: MPIPRO-34
A: Add del /Q T:\%USERNAME% to setup.bat. This removes a file of the above name so that you can create a directory with that name.  

Q: A bat file invoked by mpirun contains an xcopy command. The xcopy succeeds only for the first machine in the machines file. Files are not copied to other machines.
A: The only workaround is to copy without using xcopy.  

Q: Illegal value of MPI_COMM, trying to set TCP
A: There was a space after TCP in set MPI_COMM=TCP   

 

Q: Need to have different files for each process. How to do this? Problem with system call in C++ program.
A: As part of setup file, use commands
cd /D T\:
del /Q T:\%USERNAME%
mkdir T:\%USERNAME%\%MSTI_RANK%
copy files.* T:\%USERNAME%\%MSTI_RANK%

 

Q: mpirun command not found on login node.
A: This is as expected. Don't run jobs on login node.

 

Q: Errors on include "mpif.h" or include "mpif.f90"error when compiling fortran 90 with MPI on a Windows machine.

A: Change the syntax from include to "use mpi".  See "Compiling Parallel Applications".


MPI error messages
 

There are quite a few MPI error messages that are not due to anything done by a user's code.  In such cases, the only action to take is to report the error to the CTC. A list of the errors about which we know is below.

Q: The following is the message at the end of the standard output file. 18: MPI/Pro: [18] : TCP connections have all shut down. 18: MPI/Pro: [18] : TcpSendShort: WSASend failed, 10038
A: System problem. Send email to consult@tc.cornell.edu.

Q: MPI job gets message "fatal execution engine error"
A: System problem. Send email to consult@tc.cornell.edu.  
 

Q: MPI/Pro Error: Failed to logon the user on server: mscorlib System Error: Server encountered an internal error. For more information, turn on customErrors in the server's .config file. MPI/Pro Error Code: MPIPRO-104
A: System problem. Send email to consult@tc.cornell.edu.  

Q: MPI/Pro Error: Failed to contact server: cbsu148.tc.cornell.edu System Error: Server encountered an internal error. For more information, turn on customErrors in the server's.config file. More information: MPI/Pro Startup Service may not be running MPI/Pro Error Code: MPIPRO-101
A: System problem. Send email to consult@tc.cornell.edu.  

Q: MPI error MPI/Pro Error: Exception occurred at cleanup with server: vii0002.tc.cornell.edu System Error:Server encountered an internal error. For more information, turn on customErrors in the server's .config file. MPI/Pro Error Code: MPIPRO-1
A: System problem. Send email to consult@tc.cornell.edu.  

Q: MPI/Pro Error: Process creation failed on server: CMI023 System Error: The system cannot find the path specified More information: Rank 13: Process creation failed for
A: System problem. Send email to consult@tc.cornell.edu.  

Q: MPI/Pro Error: SocketException System Error: No such host is known More information: Ensure that the remote MPI Startup Server is installed and started.MPI/Pro Error Code: MPIPRO-101
A: System problem. Send email to consult@tc.cornell.edu.  

Q: MPI/Pro Error: SocketException System Error: No connection could be made because the target machine actively refused it More information: Ensure that the remote MPI Startup Server is installed and started. MPI/Pro Error Code: MPIPRO-101
A: System problem. Sent email to consult@tc.cornell.edu.  

Q: MPI/Pro Error: Failed to logon the user on server: mscorlib System Error: Server encountered an internal error. For more information, turn on customErrors in the server's .config file.
A: System problem. Send email to consult@tc.cornell.edu.   

Q: MPI/Pro Error: Failed to logon the user on server: vii0040.tc.cornell.edu System Error: Logon failure: the user has not been granted the requested logon type at this computer More information: System error code: 1385 
A: System problem. Send email to consult@tc.cornell.edu

Q: Job hung. One of the nodes had a winsocket error.
A: System problem. Send email to consult@tc.cornell.edu.

Q: Failed to create COM object on host name   "CBSU027.TC.CORNELL.EDU" Library not registered.
A: System problem. Send email to consult@tc.cornell.edu.

Q: Copy of executable and input files failed on vi0004.
A: System problem. Send email to consult@tc.cornell.edu.  

Q: Mpi job failed with error TcpPostRecv: WSARecv failed
A: System problem. Send email to consult@tc.cornell.edu    

Q: MPI/Pro Error: SocketException System Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond More information: Ensure that the remote MPI Startup Server is installed and started. MPI/Pro Error Code: MPIPRO-101
A: System problem. Send email to consult@tc.cornell.edu.  

Q: MPI/Pro Error: Failed to contact server: vii0133.tc.cornell.edu System Error: Exception has been thrown by the target of an invocation. More information: Server encountered an internal error. For more information, turn on customErrors in the server's .config file.. Ensure that the remote MPI Startup Server is installed and started. Specific Error Code: -101
A: System problem. Send email to consult@tc.cornell.edu.  

Q: User logon failed on host name "CBSU007.TC.CORNELL.EDU"
A: System problem. Send email to consult@tc.cornell.edu.

Q: MPI/Pro Error: Failed to contact server: cmi032.tc.cornell.edu System Error: No receiver registered More information: MPI/Pro Startup Service may not be running MPI/Pro Error Code: MPIPRO-101
A: System problem. Send email to consult@tc.cornell.edu.  

Q: User has 2 nodes assigned to job cbsu048 and cbsu049. User logs on to cbsu048, has a machines files, and gets the following error message. User logon failed on host name "CBSU049.TC.CORNELL.EDU"
A: System problem. Send email to consult@tc.cornell.edu.  

Q: I got the following error on CTC071 and CTC072: MPI/Pro Error: Process creation failed on server: CTC071 System Error: The system cannot find the path specified More information: Rank 6: Process creation failed for …   Running H:\CTC Tools\login.bat Command window on other machines does not show the first two lines. It seems drive mapping has some problems.
A: System problem. Send email to consult@tc.cornell.edu.

Q: MPI/Pro Error:SocketException System Error:No connection could be made because the target machine actively refused it.
A: Send email to consult@tc.cornell.edu