Migrating from the Current Batch Scheduler (CCS) to the New Batch Scheduler (vsched)

The CTC is introducing a new batch queueing system, the Velocity Scheduler (vsched), that will replace the Cluster CoNTroller System (CCS). 

Advantages

  • Improved Scheduling: In deciding which job to start next, the Velocity Scheduler distinguishes the jobs by the resource that they have requested, v2, v3, development, serial, etc. It looks at the first job asking for a given resource and will not allow any other job asking for that resource to delay this first job.
  • Faster Cleanup: Jobs will clear much faster from the queue, regardless of whether the job ends normally or is cancelled by the system.
  • Command-line arguments: You can now use command line arguments to scripts.
    (\\tc.cornell.edu\tc\users\sverdlik\cmdline_args.bat arg1 arg2)
Migration

The migration should be relatively straightforward.  There are 3 main differences you will encounter in terms of commands and files:

  1. Commands that used to be of the form ccxx (ccsubmit, ccq, etc.) have been replaced by vsched -<option>.
  2. REM CCS keywords in a .bat file are replaced by xml tags in a .xml file.
  3. Batch jobs now require 2 files. Actions to be performed during the batch job are now contained in a separate script. The script can be written in any scripting language (cmd, Perl, Python, etc.). The name of this script is identified by the <run> tag in the .xml file.
Porting
  1. Password
    As with the Cluster CoNTroller System, you need to register your password on a CTC login machine before using the Velocity Scheduler.
    Cluster CoNTroller System
    Velocity Scheduler
    ccpasswd -r
    vsched -passwd
  2. 2 files
    The first step is to break your original .bat file into 2 files: vtest.xml and v2test.bat in this example. The Velocity Scheduler equivalents for the REM CCS statements will be in vtest.xml. The executable statements will be in v2test.bat. The only executable commands that you need to change are the 2 in red.   They are "vsched -machines" which replaces "call machinemaker" and "vsched -cancel" which replaces "ccrelease".

    Cluster CoNTroller System

    REM CCS statements------------|
    REM CCS account = sverdlik    |
    REM CCS nodes = 32            |
    REM CCS minutes = 60          |------------>
    REM CCS type = batch          |
    REM CCS requirements = v2     |
    REM --------------------------|
    REM
    REM
    REM executable commands-------|
    Set variables and cd T:       |
    call machinemaker             |
    Run setup script              |----------->
    Execute job                   |
    Run cleanup script            |
    ccrelease---------------------|
    Velocity Scheduler

    file vtest.xml
    <?xml version="1.0" ?>
    <!-- Sample XML Job File -->
    <job>
    <nodes>32</nodes>
    <minutes>60</minutes>
    <type>batch</type>
    <affiliation>v2</affiliation>
    <run>\\tc.cornell.edu\tc\users\sverdlik\v2test.bat</run> </job>

    file v2test.bat
    Set variables and cd T:
    vsched -machines
    Run setup script
    Execute job
    Run cleanup script
    vsched -cancel
  3. Submit the job
    vsched -s vtest.xml
  4. Examine the queue
    vsched -q

Comparison of Commands

The next table lists command equivalents for the Cluster CoNTroller System and the Velocity Scheduler.

Cluster CoNTroller System
Velocity Scheduler
ccsubmit
vsched -submit or vsched -s
cctypes
vsched -affiliations or vsched -a
ccq 
vsched -queue or vsched -q
ccusage
vsched -usage or vsched -u
ccrm
vsched -cancel or vsched -c
ccpasswd
vsched -passwd or vsched -pa
ccrelease  
vsched -cancel  or vsched -c
call machinemaker  
vsched -machines or vsched -m

There are some additional features for the Velocity Scheduler that were not available for the Cluster CoNTroller System.

Additional options for vsched:
-i or -info <hostname>
lists system information for a given compute node
-po or -policy <affiliation>
lists the policy for all or by affiliation
-r or -restart  <JobID>
restarts a job, by JobID, must be job owner

.XML files

This section contains a discussion of issues related to xml that are relevant to using the Velocity Scheduler. The Cluster CoNTroller System uses one .bat file, with the required REM CCS keywords first and then the commands. With the Velocity Scheduler the xml tags are in a file of type .xml and the commands are in a second file, usually a script.

  • Sample .xml file, vtest.xml

  • You would submit it to the batch system with the command vsched -submit vtest.xml.

    <?xml version="1.0" ?>

    <!-- Sample XML Job File -->
    <job>
    <nodes>32</nodes>
    <minutes>60</minutes>
    <type>batch</type>
    <affiliation>v2</affiliation>
    <run>\\tc.cornell.edu\tc\users\sverdlik\v2test.bat</run>
    </job>

  • Velocity Scheduler .XML files
    • They should have the file extension .xml.  You will then issue the command
      "vsched -submit ...xml".
    • The items in <...> are referred to as xml tags.  They are case sensitive. You should not change them.
    • "<affiliation>" replaces "REM CCS requirements".
    • The <run> tag contains the command that you want to execute in the batch system.  With only 2 required changes, it will typically be the file that you have previously submitted to the Cluster Controller system. If you use the old .bat file, the REM CCS lines will now be treated as comments or remarks.

Changes in behavior

  • "vsched -machines" replaces "call machinemaker".
    "vsched -cancel" replaces "ccrelease".
  • The file in the <run> tag is not copied by the Velocity Scheduler.  It is executed one line at a time.  If you submit a job and then change the contents of the file, the changed commands will be the ones that are executed when your batch job runs.

New Features

  • You can use command-line arguments to the executable that is named in the <run> tag.  Previously you would ccsubmit a .bat file, but could not use arguments to the .bat file.
  • The <run> tag can contain any script or executable.  It need not be a .bat file.
  • The restart option will cause the command in the <run> tag to start again.