Submitting Multiple Jobs Using HTCondor
Introduction
This guide shows you how to write a submit file to submit multiple jobs. This option is great for submitting high throughput workloads, such as iterating over multiple datasets or parameters.
Table of Contents
Overview
HTCondor has tooling for submitting many jobs from a single submit file. Instead of writing a script to write 1000 .sub files, you can write just one .sub file and submit it once. Using this option also ensures reliable operation of the the login nodes.
π Considerations
The hardware of the submit server can be overwhelmed if there are a significant number of jobs submitted at once or rapidly starting and finishing. Plan ahead for the following scenarios:
- If you plan to submit 10,000+ jobs at a time, add
max_idle = 10000to your submit file.- If you plan to submit 1000+ jobs, please make sure that each job has a minimum run time of 5 minutes (on average). If your calculations are shorter than 5 minutes, then modify your workflow to run multiple calculations per job.
We recommend submitting multiple jobs by using one of these options:
queue <N>. Submit N number of jobs. Useful for performing replications, looping through files named with numbers, and looping through a matrix where each job uses information from a specific row or column.queue <var> from <list>. Loops through a list of file names, parameters, etc. as defined in separate text file. This is the most flexible option.- Organize Jobs Into Individual Directories. Great for pipelines using the same scripts but need outputs to be separated in their own directories.
β οΈ Avoid using HTCondorβs default variables
The examples below will include the use of
$(variable_name)to specify details like input file names, file locations (aka paths), etc. When selecting a custom variable name, avoid default HTCondor submit file variables:
ClusterorClusterIDProcessorProcIDbatch_nameoutputinputarguments
Option 1: Submit N number of jobs with queue <N>
Use queue N to submit N number of jobs. Each job will be assigned a unique Process number from 0 to N-1. Because the Process variable is unique for each job, it can be used in the submit file to create unique filenames or paths for each job.
Example: queue 10
batch_name = job_$(Cluster)
shell = echo $(Process)
log = $(batch_name).log
error = $(batch_name)_$(Process).err
output = $(batch_name)_$(Process).out
request_cpus = 1
request_memory = 10 MB
request_disk = 10 MB
queue 10
- This submit file will create 10 jobs, each numbered
0through9. This number replaces every instance of$(Process), including the command inshell, or the filenames of the standard output/standard error files. $(Cluster)is populated with the unique job ID generated upon submission.$(batch_name)is a default submit file option. Its value is displayed when runningcondor_q.
π‘ Start
$(Process)at 1
If you prefer starting
$(Process)at1instead of0, add this to your submit file:plusone = $(Process) + 1 NewProcess = $INT(plusone,%d) shell = echo $(NewProcess) ... remaining submit details ... queue 10Now, the custom variable
$(NewProcess)can be used and will range from1to10.
Option 2: Submit multiple jobs that iterate over variables with queue <variable> from <list>
This is the most flexible option and is useful especially for a list of parameters or files. Use the queue <variable> from <list> syntax to submit multiple jobs from a list (like a for loop).
Example: queue state from parameters.txt
Suppose you need to run an analysis (compare_states) on three different data files (illinois.data, nebraska.data, wisconsin.data). Each analysis needs to be submitted as a separate job.
First, we create a list of the .data files we want to iterate over, called parameters.txt:
illinois.data
nebraska.data
wisconsin.data
Next, in the submit file, following the pattern queue <var> from <list>, replace <var> with a custom variable name like state and replace <list> with parameters.txt, our list of files:
queue state from parameters.txt
For each line in parameters.txt, HTCondor will submit a job and the variable
$(state) can be used anywhere in the submit file to represent the name of the .data file
to be used by that job. For the first job, $(state) will be illinois.data, for the
second job $(state) will be nebraska.data, and so on. For example:
executable = compare_states
arguments = $(state)
transfer_input_files = $(state)
... remaining submit details ...
queue state from parameters.txt
π‘ Create your list with bash scripting
You can quickly create lists using bash scripting. In the example above, use:
ls *.data > parameters.txtin the directory containing the .data files to quickly create your list.
Use multiple variables for each job
queue <var> from <list> works with multiple variables, delimited by commas.
Example: queue state, year from parameters.txt
Letβs say we need to add a year as an additional input parameter for our jobs. We can modify your parameters.txt file:
illinois.data, 1995
illinois.data, 2005
nebraska.data, 1999
nebraska.data, 2005
wisconsin.data, 2000
wisconsin.data, 2015
Modify the queue statement to define two variables named state and year:
queue state, year from parameters.txt
The variables $(state) and $(year) can be used in the submit file:
executable = compare_states
arguments = $(state) $(year)
transfer_input_files = $(state)
... remaining submit details ...
queue state, year from parameters.txt
Option 3: Organize jobs into individual directories
Example: Submit multiple jobs in different directories with queue <variable> from <list>
Suppose thereβs a directory for each state you want to analyze, and each of those directories has
its own input file named input.data:
[netid@ap2001 state-analysis]$ tree
.
βββ compare_states
βββ compare_states.sub
βββ illinois/
βΒ Β βββ input.data
βββ nebraska/
βΒ Β βββ input.data
βββ wisconsin/
βββ input.data
We will use the HTCondor submit file attribute initialdir to define the specific directory from which each job in the batch will start. By default, initialdir is set to the directory from which the condor_submit command is executed.
In this example, the default initialdir would be the state-analysis directory, but we want to change it to illinois, nebraska, and wisconsin for each job.
First, we create a text file called state-dirs.txt with a list of our initial directories:
illinois
nebraska
wisconsin
Then we modify our submit file and add the initialdir attribute:
initialdir = $(state_dir)
executable = compare_states
transfer_input_files = input.data
... remaining submit details ...
queue state_dir from state-dirs.txt
In this example, HTCondor creates a job for each directory in state-dirs.txt and uses that directory as the initialdir from which the job will be submitted. As a result, transfer_input_files = input.data can be used without specifying the path to this input.data file. Any output generated by the job will then be placed in the individual state directories.
β οΈ Differences in
executableandshellwhen usinginitialdir
initialdironly changes the input and output file path (including the HTCondorlog,error, andoutputfiles), not theexecutable, which is in the same working directory as the submit file (compare_states.sub).
- If you are using
executable, your executable should be in the same directory as the submit file.- If you are using
shell, you will need to transfer in your executable, which means it must be relative to the initial working directory. Example submit file:initialdir = $(state_dir) shell = ./compare_states transfer_input_files = input.data, ../compare_states ... remaining submit details ... queue state_dir from state-dirs.txt
Example: Submit multiple jobs in different directories with queue <variable> matching <pattern>
Suppose thereβs a directory for each state you want to analyze, and each of those directories (all prefixed with state_) has its own input file named input.data:
[netid@ap2001 state-analysis]$ tree
.
βββ compare_states
βββ compare_states.sub
βββ state_illinois/
βΒ Β βββ input.data
βββ state_nebraska/
βΒ Β βββ input.data
βββ state_wisconsin/
βββ input.data
We can use queue <variable> matching <pattern> and initialdir to submit multiple jobs in their individual directories. Read the previous section to learn more about initialdir.
initialdir = $(state_dir)
executable = compare_states
transfer_input_files = input.data
... remaining submit details ...
queue state_dir matching state_*
β οΈ When
<variable>is a directory
When your custom
<variable>is a directory, be careful when calling it in your submit file, because its value will also include the/.In the example above,
$(state_dir)will take the valuestate_wisconsin/. If you specify:output = $(state_dir).outInstead of getting the output file
state_wisconsin.out, youβll create a hidden.outfile in astate_wisconsin/subdirectory:[netid@ap2001 state-analysis]$ tree -a state_wisconsin state_wisconsin βββ state_wisconsin βΒ Β βββ .err βΒ Β βββ .log βΒ Β βββ .out βββ input.dataWhen
<variable>is a directory, we recommend using it only ininitialdir.