Job monitoring/management commands

There are a number of commands to monitor jobs on Artemis. A brief set of useful commands is shown below. For more commands, see the PBS Professional user manual.

Command Description
qstat -u abcd1234 show status of abcd1234’s jobs
qdel 1234567 delete job 1234567 from queue
qstat show status of all jobs
qstat -f 1234567 show detailed stats for job 1234567
qstat -xf 1234567 show detailed stats for job 1234567, even after it has finished

When jobs finish, they produce three output files. One for standard output, one for standard error and a resource usage file. The file formats are as follows:

<JobName>.o<JobID>       – Standard output file
<JobName>.e<JobID>       – Standard error file
<JobName>.o<JobID>_usage – Resource usage file

If you don’t redirect standard output or standard error to a file, they will be printed in the .o or the .e files and only appear after your jobs finish. These files may contain useful information about why your job terminated before it finished.

The resource usage file contains details about how long your job ran for and also the memory used by your job. You can use the information in the resource usage file to optimise your walltime and memory requests for future jobs. An example resource usage file is shown below:

Job Id: 1050977.pbsserver for user abcd1234 in queue small
Job Name: TestJob
Project: RDS-ICT-PANDORA-RW
Exit Status: 0
Walltime requested:   00:03:00 :      Walltime used:   00:01:36
    Cpus requested:         48 :
          Cpu Time:   00:36:38 :        Cpu percent:       3102
     Mem requested:        8gb :           Mem used:  2342348kb
    VMem requested:       None :          VMem used:  2342348kb
    PMem requested:       None :          PMem used:       None