Data Transfer Queue (dtq)¶
Artemis has a queue, called dtq, that has nodes with access to /rds
(which is the RCOS version of the Research Data Store)
and the internet. This queue is
dedicated to running data moving jobs and other input/output (I/O)
intensive work, such as compressing and archiving files, copying data
from Artemis to the Research Data Store, or copying data to and from
remote locations. These nodes have dedicated ethernet and infiniband
bandwidth to maximise file transfer speeds, so they deliver
significantly better data transfer and I/O performance than the login
nodes. No other compute nodes on Artemis have access to /rds.
The login nodes can still be used for small I/O work, but
heavy copy or I/O work should be submitted to
dtq job, which moves the
mydata directory from Artemis to RCOS, using an example project called
PANDORA, is shown below:
#!/bin/bash #PBS -P PANDORA #PBS -q dtq #PBS -l select=1:ncpus=1:mem=4gb,walltime=1:00:00 rsync -rtvxP /project/PANDORA/mydata /rds/PRJ-PANDORA/
To run this script, save it to a file (for example
my-copy-job.pbs) and submit it to the scheduler:
Alternatively, the data mover nodes can be used interactively. To gain interactive access to a data mover node, type the following command:
qsub -I -P PANDORA -q dtq -l select=1:ncpus=2:mem=8gb
remembering to replace PANDORA with your short project name.
The data transfer queue is designed for running file manipulation
commands. For example, you could use
wget (for downloading data from the
rm, plus more. This queue,
however, is not intended for compute jobs. Compute jobs running in this queue will be
terminated without notice. Since no real compute jobs are allowed to run
in this queue, jobs run in
dtq will not contribute to your project’s
fair share. However,
your project’s current fair share value will impact the priority of jobs
submitted to this queue.
Resource limits for
dtq can be found in
the queue resource limits table.
Automating data transfers¶
dtq can be used to schedule data transfers and compute jobs that run
sequentially. For example, you could set up jobs that do the following:
- Submit a
dtqjob to copy data to Artemis for processing.
- Submit a processing job that automatically runs after the
dtqjob successfully completes.
- Upon successful completion of the processing job:
- Have the processing job copy the resultant data to another location on Artemis; or
- If the result of processing is to be copied to a destination that
is not accessible from the compute nodes, then submit another
dtqjob to copy this data to its remote location.
- Optionally, delete the data on Artemis that was used for processing.
This entire process can be automated using three PBS scripts:
#!/bin/bash #PBS -P PANDORA #PBS -q dtq #PBS -l select=1:ncpus=1:mem=4gb,walltime=00:10:00 rsync -avxP /rds/PRJ-PANDORA/input_data /scratch/PANDORA/
#!/bin/bash #PBS -P PANDORA #PBS -q defaultQ #PBS -l select=1:ncpus=4:mem=10gb,walltime=20:00:00 cd /scratch/PANDORA/input_data my-program < input.inp > output.out
#!/bin/bash #PBS -P PANDORA #PBS -q dtq #PBS -l select=1:ncpus=1:mem=4gb,walltime=00:10:00 rsync -avxP /scratch/PANDORA/input_data/ /rds/PRJ-PANDORA/output_data
Then, you can submit these three scripts (using the
option) to the scheduler as follows:
[abcd1234@login1]$ qsub copy-in.pbs 1260945.pbsserver [abcd1234@login1]$ qsub -W depend=afterok:1260945 process-data.pbs 1260946.pbsserver [abcd1234@login1]$ qsub -W depend=afterok:1260945:1260946 copy-out.pbs
If successful, your jobs should look similar to this if you type
qstat -u abcd1234:
[abcd1234@login1]$ qstat -u abcd1234 pbsserver: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 1260945.pbsserv abcd1234 dtq copy-in.pb -- 1 1 4096mb 01:00 R -- 1260946.pbsserv abcd1234 small process-da -- 1 1 2gb 01:00 H -- 1260947.pbsserv abcd1234 dtq copy-out.p -- 1 1 4096mb 01:00 H --
copy-out.pbs are both in the
which means they’re being “held” by the scheduler until the previous
jobs have successfully completed.
Data transfer script: dt-script¶
A data transfer script, called
dt-script, has been created to help
simplify data transfer and submitting compute jobs on the transferred
data. This tool is especially useful if you need to transfer large
amounts of data from RCOS to Artemis before processing it.
The syntax of
This script uses
rsync to copy from the source directory
<from> to the
<to> and then submits the PBS job script contained
job.pbs that will start once the copy successfully completes.
The arguments this script accepts are shown in the following table:
|Short argument||Long argument||Description|
|-f||–from||The source of the data|
|-t||–to||The destination of the data|
|-P <project>||–project <project>||All pbs jobs require a project to run under. Specify it here.|
|-notest||–skip||Skip testing of readable source. Useful if called from a node and /rds is the source which is not available on the calling node|
|-w <walltime>||–walltime <walltime>||Wall Time Required (Default 24:00:00 (1 day))|
|-ncpus <ncpus>||–ncpus <ncpus>||CPU cores required (Default 1)|
|-mem <mem>||–mem <mem>||RAM Required (default 4GB )|
|-n||–name||Set the copy job name (default “dt-script”)|
|-rf <rsync extra flags>||–rflags <rsync extra flags>||Any extra rsync flags you may require|
|-j <pbs job script>||–job <pbs job script>||The pbs job script to run after the copy. if no job script is specified, then no subsequent job is run|
|-jf <flags>||–jobflags <flags>||Any extra ‘qsub’ flags that may be needed to run the pbs job script specified with –job|
|-d <depend option>||–depend <depend option>||The default depend option is “afterok”. You may change this to “afterany” or “afternotok” with this option|
|-l <logfile>||–log <logfile>||Rather than wait for the PBS output files, you may specify a log of stdout and stderr from the rsync command here|
The script returns the PBS job ID of the last job it submits as follows:
- If no PBS job script is specified, the PBS job ID of the dtq job is returned and may be used as a dependency of subsequent jobs.
- If a PBS job script is specified, the PBS job ID of that job is returned.
The source code of
dt-script is available to all Artemis users. The path
to the script is
/usr/local/bin/dt-script. Feel free to make a copy of
this script if you would like to modify it for your needs.
Example dt-script workflow¶
An example dt-script workflow, using the example project
[abcd1234@login2]$ dt-script -P PANDORA \ --from /rds/PRJ-PANDORA/mydata \ --to /scratch/PANDORA/ \ --job /scratch/PANDORA/run-processing.pbs 1261577.pbsserver [abcd1234@login2]$ qstat -u abcd1234 pbsserver: Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 1261576.pbsserv abcd1234 dtq dt-script -- 1 1 4gb 24:00 Q -- 1261577.pbsserv abcd1234 small process-da -- 1 1 2gb 01:00 H --
After verifying the processing job ran successfully, you can transfer
data back to RCOS using another
[abcd1234@login2]$ dt-script -P PANDORA \ --from /scratch/PANDORA/mydata/ \ --to /rds/PRJ-PANDORA/mydata_output 1261588.pbsserver [abcd1234@login2]$ qstat -u abcd1234 pbsserver: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 1261588.pbsserv abcd1234 dtq dt-script -- 1 1 4gb 24:00 Q --
Finally, you can remove any temporary data from Artemis after checking all data was successfully transferred to RCOS:
[abcd1234@login2]$ rm /scratch/PANDORA/mydata/* [abcd1234@login2]$ rmdir /scratch/PANDORA/mydata