Data Transfer Queue (dtq)

Artemis has a queue, called dtq, that has nodes with access to /rds (which is the RCOS version of the Research Data Store) and the internet. This queue is dedicated to running data moving jobs and other input/output (I/O) intensive work, such as compressing and archiving files, copying data from Artemis to the Research Data Store, or copying data to and from remote locations. These nodes have dedicated ethernet and infiniband bandwidth to maximise file transfer speeds, so they deliver significantly better data transfer and I/O performance than the login nodes. No other compute nodes on Artemis have access to /rds. The login nodes can still be used for small I/O work, but heavy copy or I/O work should be submitted to dtq instead.

An example dtq job, which moves the mydata directory from Artemis to RCOS, using an example project called PANDORA, is shown below:

#!/bin/bash
#PBS -P PANDORA
#PBS -q dtq
#PBS -l select=1:ncpus=1:mem=4gb,walltime=1:00:00
rsync -rtvxP /project/PANDORA/mydata /rds/PRJ-PANDORA/

To run this script, save it to a file (for example my-copy-job.pbs) and submit it to the scheduler:

qsub my-copy-job.pbs

Alternatively, the data mover nodes can be used interactively. To gain interactive access to a data mover node, type the following command:

qsub -I -P PANDORA -q dtq -l select=1:ncpus=2:mem=8gb

remembering to replace PANDORA with your short project name.

The data transfer queue is designed for running file manipulation commands. For example, you could use wget (for downloading data from the internet), scp, tar, cp, mv, rm, plus more. This queue, however, is not intended for compute jobs. Compute jobs running in this queue will be terminated without notice. Since no real compute jobs are allowed to run in this queue, jobs run in dtq will not contribute to your project’s fair share. However, your project’s current fair share value will impact the priority of jobs submitted to this queue.

Resource limits for dtq can be found in the queue resource limits table.

Automating data transfers

dtq can be used to schedule data transfers and compute jobs that run sequentially. For example, you could set up jobs that do the following:

  1. Submit a dtq job to copy data to Artemis for processing.
  2. Submit a processing job that automatically runs after the dtq job successfully completes.
  3. Upon successful completion of the processing job:
    1. Have the processing job copy the resultant data to another location on Artemis; or
    2. If the result of processing is to be copied to a destination that is not accessible from the compute nodes, then submit another dtq job to copy this data to its remote location.
  4. Optionally, delete the data on Artemis that was used for processing.

This entire process can be automated using three PBS scripts: copy-in.pbs, process-data.pbs and copy-out.pbs:

copy-in.pbs:

#!/bin/bash
#PBS -P PANDORA
#PBS -q dtq
#PBS -l select=1:ncpus=1:mem=4gb,walltime=00:10:00

rsync -avxP /rds/PRJ-PANDORA/input_data /scratch/PANDORA/

process-data.pbs:

#!/bin/bash
#PBS -P PANDORA
#PBS -q defaultQ
#PBS -l select=1:ncpus=4:mem=10gb,walltime=20:00:00

cd /scratch/PANDORA/input_data
my-program < input.inp > output.out

copy-out.pbs:

#!/bin/bash
#PBS -P PANDORA
#PBS -q dtq
#PBS -l select=1:ncpus=1:mem=4gb,walltime=00:10:00

rsync -avxP /scratch/PANDORA/input_data/ /rds/PRJ-PANDORA/output_data

Then, you can submit these three scripts (using the -W depend=afterok option) to the scheduler as follows:

[abcd1234@login1]$ qsub copy-in.pbs
1260945.pbsserver
[abcd1234@login1]$ qsub -W depend=afterok:1260945 process-data.pbs
1260946.pbsserver
[abcd1234@login1]$ qsub -W depend=afterok:1260945:1260946 copy-out.pbs

If successful, your jobs should look similar to this if you type qstat -u abcd1234:

[abcd1234@login1]$ qstat -u abcd1234
pbsserver:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time

--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----

1260945.pbsserv abcd1234 dtq      copy-in.pb    --    1   1 4096mb 01:00 R   --
1260946.pbsserv abcd1234 small    process-da    --    1   1    2gb 01:00 H   --
1260947.pbsserv abcd1234 dtq      copy-out.p    --    1   1 4096mb 01:00 H   --

Note that process-data.pbs and copy-out.pbs are both in the H state, which means they’re being “held” by the scheduler until the previous jobs have successfully completed.

Data transfer script: dt-script

A data transfer script, called dt-script, has been created to help simplify data transfer and submitting compute jobs on the transferred data. This tool is especially useful if you need to transfer large amounts of data from RCOS to Artemis before processing it.

The syntax of dt-script is:

This script uses rsync to copy from the source directory <from> to the destination directory <to> and then submits the PBS job script contained in job.pbs that will start once the copy successfully completes.

The arguments this script accepts are shown in the following table:

Short argument Long argument Description
-f –from The source of the data
-t –to The destination of the data
-P <project> –project <project> All pbs jobs require a project to run under. Specify it here.
-notest –skip Skip testing of readable source. Useful if called from a node and /rds is the source which is not available on the calling node
-w <walltime> –walltime <walltime> Wall Time Required (Default 24:00:00 (1 day))
-ncpus <ncpus> –ncpus <ncpus> CPU cores required (Default 1)
-mem <mem> –mem <mem> RAM Required (default 4GB )
-n –name Set the copy job name (default “dt-script”)
-rf <rsync extra flags> –rflags <rsync extra flags> Any extra rsync flags you may require
-j <pbs job script> –job <pbs job script> The pbs job script to run after the copy. if no job script is specified, then no subsequent job is run
-jf <flags> –jobflags <flags> Any extra ‘qsub’ flags that may be needed to run the pbs job script specified with –job
-d <depend option> –depend <depend option> The default depend option is “afterok”. You may change this to “afterany” or “afternotok” with this option
-l <logfile> –log <logfile> Rather than wait for the PBS output files, you may specify a log of stdout and stderr from the rsync command here

The script returns the PBS job ID of the last job it submits as follows:

  • If no PBS job script is specified, the PBS job ID of the dtq job is returned and may be used as a dependency of subsequent jobs.
  • If a PBS job script is specified, the PBS job ID of that job is returned.

The source code of dt-script is available to all Artemis users. The path to the script is /usr/local/bin/dt-script. Feel free to make a copy of this script if you would like to modify it for your needs.

Example dt-script workflow

An example dt-script workflow, using the example project PANDORA, is shown below:

[abcd1234@login2]$ dt-script -P PANDORA \
--from /rds/PRJ-PANDORA/mydata \
--to /scratch/PANDORA/ \
--job /scratch/PANDORA/run-processing.pbs
1261577.pbsserver
[abcd1234@login2]$ qstat -u abcd1234

pbsserver:

Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1261576.pbsserv abcd1234 dtq      dt-script     --    1   1    4gb 24:00 Q   --
1261577.pbsserv abcd1234 small    process-da    --    1   1    2gb 01:00 H   --

After verifying the processing job ran successfully, you can transfer data back to RCOS using another dt-script command:

[abcd1234@login2]$ dt-script -P PANDORA \
--from /scratch/PANDORA/mydata/ \
--to /rds/PRJ-PANDORA/mydata_output
1261588.pbsserver
[abcd1234@login2]$ qstat -u abcd1234

pbsserver:
                                                             Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1261588.pbsserv abcd1234 dtq      dt-script     --    1   1    4gb 24:00 Q   --

Finally, you can remove any temporary data from Artemis after checking all data was successfully transferred to RCOS:

[abcd1234@login2]$ rm /scratch/PANDORA/mydata/*
[abcd1234@login2]$ rmdir /scratch/PANDORA/mydata