This document provides step-by-step instructions to run a simple serial batch
job on a CTC compute node. The instructions are followed by a sample
session demonstrating the same steps.
|
Instructions |
-
The scheduler is installed on all CTC login nodes (e.g.
ctcloginb.tc.cornell.edu). Connect to one of the nodes using Remote Desktop
Connection as documented in
Accessing CTC Machines.
-
After log in, open a command prompt window:
Start | All Programs | Accessories | Command Prompt
- or -
Start | Run | cmd
All subsequent instructions in this document should be issued from this window
unless otherwise specified.
-
Before you can submit any batch jobs, you must register your password with the
scheduler. Do this before you use the scheduler for the first time, and again
after every time you change your password.
H:\Users\yourID> vsched -passwd
You will be prompted to enter your password twice. This is secure; it uses Web
Services Enhancements Encryption to pass your credentials.
-
Prepare a file named your_job.xml in the format shown here. All of the
xml tags shown in the example are required. This file specifies number of
minutes, number of nodes, and type of job (batch or interactive), as well as:
-
what you want to run (e.g. your.bat script)
-
where you want to run it (e.g. the vplustest pool of nodes)
<run>...</run> tags specify what you want to run on the nodes. It
can be any script or executable.
<type>...</type> tags must contain either batch or interactive
.
If the job type is interactive, the run tags may be left blank; anything
specified in the run tags will be ignored.
<?xml version="1.0" ?>
<!-- Sample XML Job File -->
<job>
<nodes>1</nodes>
<minutes>5</minutes>
<type>batch</type>
<affiliation>development</affiliation>
<run>\\tc.cornell.edu\tc\users\your_userid\your.bat</run>
</job>
|
-
The previous step stated that within the <run>...</run> tags you
can specify any script or executable. We have used a .bat script for this
example. In general, a batch script is written to:
-
set up a temporary directory on the compute node for your files
-
copy the files your job needs to the temporary directory
-
run your program(s)
-
copy the output files back to the CTC fileservers
-
delete your files in the temporary directory on the compute node
-
end the batch session
REM Create a clean local temp folder, T:\myuserid
call TDirCreate.bat
REM Change the current working directory
cd /d T:\%USERNAME%
REM Copy executable and data files to the current working directory
copy h:\users\%USERNAME%\quick.exe
REM Run the program
quick.exe 1>quick.out 2>quick.err
REM Copy results back to fileserver
copy /Y quick.* h:\users\%USERNAME%
REM Delete the local temp folder and everything in it
call TDirDelete.bat
REM Release the nodes and end the job
vsched -cancel
|
-
Submit the xml file from the command prompt. The JobID will be returned. No
files will be read or copied until your job actually starts.
H:\Users\yourID> vsched -submit job_name.xml
2043
-
Your job should now either be running or be in the queue waiting to start. At
this point you can simply wait for it to finish, or you can view the queue
H:\Users\yourID> vsched -q
or cancel your job
H:\Users\yourID> vsched -c <JobID>
or use Remote Desktop Connect to log into the node where your job is running to
either see that the job is running properly, or to issue commands.
|
|
Example |
This sample session begins after you have logged into a CTC login node and have
opened a command prompt window.
H:\Users\yourID>vsched -passwd
Please enter your password : ***********
Please confirm your password : ***********
Your password for Velocity Scheduler has been set
H:\Users\yourID>cd vsched
H:\Users\yourID\vsched>dir
Volume in drive H has no label.
Volume Serial Number is A417-76E2
Directory of H:\Users\yourID\vsched
04/12/2006 01:40 PM .
04/12/2006 01:40 PM ..
04/12/2006 01:45 PM 466 MyJob.bat
04/12/2006 09:51 AM 247 MyJob.xml
08/23/2000 12:24 PM 36,864 quick.exe
3 File(s) 37,577 bytes
2 Dir(s) 278,468,329,472 bytes free
H:\Users\yourID\vsched>type MyJob.xml
<?xml version="1.0" ?>
<!-- Sample XML Job File -->
<job>
<nodes>1</nodes>
<minutes>5</minutes>
<type>batch</type>
<affiliation>development</affiliation>
<run>\\tc.cornell.edu\tc\users\yourID\vsched\MyJob.bat</run>
</job>
H:\Users\yourID\vsched>type MyJob.bat
REM Create a clean local temp folder, T:\myuserid
call TDirCreate.bat
REM Change the current working directory
cd /d T:\%USERNAME%
REM Copy executable and data files to the current working directory
copy h:\users\%USERNAME%\vsched\quick.exe
REM Run the program
quick.exe 1>quick.out 2>quick.err
REM Copy results back to fileserver
copy /Y quick.* h:\users\%USERNAME%\vsched
REM Delete the local temp folder and everything in it
call TDirDelete.bat
REM Release the nodes and end the job
vsched -cancel
H:\Users\yourID\vsched>vsched -s MyJob.xml
1696
H:\Users\yourID\vsched>vsched -q
JobId User Nodes Time Type Stat End Time Master Affiliation
-----------------------------------------------------------------------------
1696 yourID 2 0:05 B R 13:52 04/12 ctc065 vplustest
H:\Users\yourID\vsched>vsched -q
JobId User Nodes Time Type Stat End Time Master Affiliation
-----------------------------------------------------------------------------
1696 yourID 2 0:05 B C 13:52 04/12 ctc065 vplustest
H:\Users\yourID\vsched>vsched -q
JobId User Nodes Time Type Stat End Time Master Affiliation
-----------------------------------------------------------------------------
H:\Users\yourID\vsched>dir
Volume in drive H has no label.
Volume Serial Number is A417-76E2
Directory of H:\Users\yourID\vsched
04/12/2006 01:47 PM .
04/12/2006 01:47 PM ..
04/12/2006 01:45 PM 466 MyJob.bat
04/12/2006 09:51 AM 247 MyJob.xml
04/12/2006 01:47 PM 0 quick.err
08/23/2000 12:24 PM 36,864 quick.exe
04/12/2006 01:47 PM 51 quick.out
5 File(s) 37,628 bytes
2 Dir(s) 278,468,325,376 bytes free
H:\Users\yourID\vsched>type quick.out
342.320000 + 98.200000 = 440.520000
3 - 8 = -5
H:\Users\yourID\vsched>
|