To Get Periodic Output from your Batch Job, Copy Files from T: to H:
CTC Monthly Tips
March 2004
Revised for vsched June 2006
Audience: Anyone running batch jobs on Velocity.
Issue: During the course of a batch job, output should be written to the local T: drive and copied to H: at the end of the job. However, there are valid reasons to periodically copy job output to H: during the course of the batch job.
Solution: By running a .bat file in the background, you can periodically copy files from T: to H:. Periodically copying files does not have the same detrimental effects as writing directly to H:.
Discussion: During the course of a batch job, output should be written to the local T: drive and copied back to H: at the end of the job. If your program writes output directly to H:, your job will take longer to run, increasing the potential for network interruptions that can cause data loss or job failure. See "
Promote Speed and Job Success: Write to Local Disk" for a detailed discussion and proper file handling examples.
There are many reasons to check and/or collect output periodically during the course of a program run. For example, you may simply want to check that your job is running properly. Another example is if your job times out and is killed by the system. If your files have not already been transferred from T: to H:, you have no way of obtaining them because you no longer have access to the nodes on which to job was run. By periodically copying files from T: to H:, there will be minimal loss of data.
Anyone can check their own job output during the course of the run by using telnet to access the compute node and view the files. However, this isn't very convenient or reliable. A better method is to periodically issue a copy command from your batch scripts.
Step-by-step Directions:
- Add the command below to your main batch file. Place the command before the executable is run. Do not omit the "/b" option which runs "copyback.bat" without pausing your main batch script.
start /b copyback.bat
- Copy copyback.bat to your batch job folder. Be sure to put the file where your batch script will find it, and check that the extension is .bat. View the file; copyback.bat has just a few commands:
:sleeploop
call sleep.bat 900 Pause execution for the specified number of seconds
copy /Y T:\%USERNAME%\my_file.out H:\users\%USERNAME%\your_output_dir\
Copy a file from T: to H:
copy /Y T:\%USERNAME%\my_file.err H:\users\%USERNAME%\your_output_dir\
Copy a file from T: to H:
goto sleeploop Loop
Note: %USERNAME% is an environment variable, it will resolve to your username.
- Edit copyback.bat
Modify the sleep interval to your preference.
Change the copy commands to copy the files you are interested in. Be sure the paths are correct.
- Test
Note for Parallel Jobs: The same principles and basic steps apply to both serial and parallel jobs. When using this method for a parallel job, the periodic copying is done only from the master node. Since it is likely that standard output and standard error are collected from all nodes to the master node, you will be able to see all of this data. However, data files written to T: on worker nodes will not be copied with the procedures described here.
Sample Files:
Here are sample batch files that incorporate copyback. Bold italic indicates the files that are new or modified for this function. Inside each of those files, the commands associated with copyback are in bold.