Submitting a JobTo submit a jobs to the condor batch system you first need to write a "submit description file" to describe the job to the system: A very simple file would look like this: | ||||||||
Added: | ||||||||
> > | ||||||||
#################### # # Example 1 # Simple HTCondor submit description file # #################### Executable = myexe Log = myexe.log Input = inputfile Output = outputfile QueueThat runs myexe on the batch machine (after copying it and inputfile to a temporary directory on the machine) and copies back the standard output of the job to a file called outputfile
A more complex example submit description would look like: | ||||||||
Added: | ||||||||
> > | ||||||||
#################### # # Example 2 # More Complex HTCondor submit description file # #################### Universe = vanilla Executable = my_analysis.sh Arguments = input-$(process).txt result/output-$(process).txt Log = log/my_analysis-$(Process).log Input = input/input-$(process).txt Output = output/my_analysis-$(Process).out Error = output/my_analysis-$(Process).err Request_memory = 2 GB Transfer_output_files = result/output-$(process).txt Transfer_output_remaps = "output-$(process).txt = results/output-$(process).txt" Notification = complete Notify_user = your.name@stfc.ac.uk Getenv = True Queue 20This submit runs 20 copies ( Queue 20 ) of my_analysis.sh input-$(process).txt result/output-$(process).txt where $(process) is replaced by a number 0 to 19. It will copy my_analysis.sh and input-$(process).txt to each of the worker nodes (taking the input file from the local input directory). The standard output and error from the job are copied back to the local output directory at the end of the job and the file result/output-$(process).txt is copied back to the local Getenv = True ) and requests 2GB of memory to run in on the worker node. Finally it e-mails the user when each job completes.
Monitoring Your JobsThe basic command for monitoring jobs iscondor_q by default this only shows jobs submitted to the "schedd" (essentially submit node) you are using, to see all the jobs in the system run condor_q -global
If jobs have been idle for a while you can use condor_q -analyze <job_id> to look at the resources requested by the job and how they match to the available resources on the cluster.
Failed jobs often go into a "Held" state rather than disappearing, condor_q -held <jobid> will often give some information on why the job failed.
condor_userprio will give you an idea of the current usage and failshares on the cluster.
Local CommandsThe PPD interactive machines have some local commands for making the Condor batch system a bit more convenient to use and more like the LSF system used on lxplus at CERN. | ||||||||
Changed: | ||||||||
< < | You can use bqsub to submit jobs (similar to LSF's bsub command). The full command can be specified on the command-line, so you don't need to create a "submit description file". Specify a time limit with bqsub -c hh:mm . eg. | |||||||
> > | You can use bqsub to submit jobs (similar to LSF's bsub command). The full command can be specified on the command-line, so you don't need to create a "submit description file". Specify a larger memory job with, eg. bqsub -s 8GB (the default is 3GB) eg. | |||||||
Deleted: | ||||||||
< < | ||||||||
Changed: | ||||||||
< < | bqsub -c 24:00 Sherpa PTCUT:=20 EVENTS=1000 | |||||||
> > | bqsub -s 8GB Sherpa PTCUT:=20 EVENTS=1000 | |||||||
You can also submit many jobs at once, eg. bqsub -n 10 ./myscript output_%%.root will run 10 jobs with arguments output_0.root or output_1.root etc (%% is replaced by the job number in each sub-job).
qjobs (like LSF's bjobs ) lists your running jobs with more helpful information.
qpeek (like LSF's bpeek ) shows a running job's logfile.
Use bqsub -h , qjobs -h , or qpeek -h for help.
How to retrieve output files from a running jobSometimes it is useful to retrieve files from running jobs to check them or before killing a job which is not needed anymore. Here is a short guide:
|
Submitting a JobTo submit a jobs to the condor batch system you first need to write a "submit description file" to describe the job to the system: A very simple file would look like this: | ||||||||
Deleted: | ||||||||
< < | ||||||||
#################### # # Example 1 # Simple HTCondor submit description file # #################### Executable = myexe Log = myexe.log Input = inputfile Output = outputfile QueueThat runs myexe on the batch machine (after copying it and inputfile to a temporary directory on the machine) and copies back the standard output of the job to a file called outputfile
A more complex example submit description would look like: | ||||||||
Deleted: | ||||||||
< < | ||||||||
#################### # # Example 2 # More Complex HTCondor submit description file # #################### Universe = vanilla Executable = my_analysis.sh Arguments = input-$(process).txt result/output-$(process).txt Log = log/my_analysis-$(Process).log Input = input/input-$(process).txt Output = output/my_analysis-$(Process).out Error = output/my_analysis-$(Process).err Request_memory = 2 GB Transfer_output_files = result/output-$(process).txt Transfer_output_remaps = "output-$(process).txt = results/output-$(process).txt" Notification = complete Notify_user = your.name@stfc.ac.uk Getenv = True Queue 20This submit runs 20 copies ( Queue 20 ) of my_analysis.sh input-$(process).txt result/output-$(process).txt where $(process) is replaced by a number 0 to 19. It will copy my_analysis.sh and input-$(process).txt to each of the worker nodes (taking the input file from the local input directory). The standard output and error from the job are copied back to the local output directory at the end of the job and the file result/output-$(process).txt is copied back to the local Getenv = True ) and requests 2GB of memory to run in on the worker node. Finally it e-mails the user when each job completes.
Monitoring Your JobsThe basic command for monitoring jobs iscondor_q by default this only shows jobs submitted to the "schedd" (essentially submit node) you are using, to see all the jobs in the system run condor_q -global
If jobs have been idle for a while you can use condor_q -analyze <job_id> to look at the resources requested by the job and how they match to the available resources on the cluster.
Failed jobs often go into a "Held" state rather than disappearing, condor_q -held <jobid> will often give some information on why the job failed.
condor_userprio will give you an idea of the current usage and failshares on the cluster.
Local CommandsThe PPD interactive machines have some local commands for making the Condor batch system a bit more convenient to use and more like the LSF system used on lxplus at CERN. You can usebqsub to submit jobs (similar to LSF's bsub command). The full command can be specified on the command-line, so you don't need to create a "submit description file". Specify a time limit with bqsub -c hh:mm . eg. | ||||||||
Added: | ||||||||
> > | ||||||||
bqsub -c 24:00 Sherpa PTCUT:=20 EVENTS=1000 | ||||||||
Changed: | ||||||||
< < | You can also submit many jobs at once, eg. bqsub -n 10 ./myscript output_%%.root will run 10 jobs with arguments output_0.root or output_1.root etc (%% is replaced by the job number in each sub-job). | |||||||
> > | You can also submit many jobs at once, eg. bqsub -n 10 ./myscript output_%%.root will run 10 jobs with arguments output_0.root or output_1.root etc (%% is replaced by the job number in each sub-job). | |||||||
qjobs (like LSF's bjobs ) lists your running jobs with more helpful information. | ||||||||
Changed: | ||||||||
< < | qpeek (like LSF's bpeek ) shows a running job's logfile (this may prompt for your password to ssh to the batch worker node). | |||||||
> > | qpeek (like LSF's bpeek ) shows a running job's logfile. | |||||||
Use bqsub -h , qjobs -h , or qpeek -h for help.
How to retrieve output files from a running jobSometimes it is useful to retrieve files from running jobs to check them or before killing a job which is not needed anymore. Here is a short guide:
|
Submitting a JobTo submit a jobs to the condor batch system you first need to write a "submit description file" to describe the job to the system: A very simple file would look like this: | ||||||||
Added: | ||||||||
> > | ||||||||
#################### # # Example 1 # Simple HTCondor submit description file # #################### Executable = myexe Log = myexe.log Input = inputfile Output = outputfile QueueThat runs myexe on the batch machine (after copying it and inputfile to a temporary directory on the machine) and copies back the standard output of the job to a file called outputfile
A more complex example submit description would look like: | ||||||||
Added: | ||||||||
> > | ||||||||
#################### # # Example 2 # More Complex HTCondor submit description file # #################### Universe = vanilla Executable = my_analysis.sh Arguments = input-$(process).txt result/output-$(process).txt Log = log/my_analysis-$(Process).log Input = input/input-$(process).txt Output = output/my_analysis-$(Process).out Error = output/my_analysis-$(Process).err Request_memory = 2 GB Transfer_output_files = result/output-$(process).txt Transfer_output_remaps = "output-$(process).txt = results/output-$(process).txt" Notification = complete Notify_user = your.name@stfc.ac.uk Getenv = True Queue 20This submit runs 20 copies ( Queue 20 ) of my_analysis.sh input-$(process).txt result/output-$(process).txt where $(process) is replaced by a number 0 to 19. It will copy my_analysis.sh and input-$(process).txt to each of the worker nodes (taking the input file from the local input directory). The standard output and error from the job are copied back to the local output directory at the end of the job and the file result/output-$(process).txt is copied back to the local Getenv = True ) and requests 2GB of memory to run in on the worker node. Finally it e-mails the user when each job completes.
Monitoring Your JobsThe basic command for monitoring jobs iscondor_q by default this only shows jobs submitted to the "schedd" (essentially submit node) you are using, to see all the jobs in the system run condor_q -global
If jobs have been idle for a while you can use condor_q -analyze <job_id> to look at the resources requested by the job and how they match to the available resources on the cluster.
Failed jobs often go into a "Held" state rather than disappearing, condor_q -held <jobid> will often give some information on why the job failed.
condor_userprio will give you an idea of the current usage and failshares on the cluster.
Local CommandsThe PPD interactive machines have some local commands for making the Condor batch system a bit more convenient to use and more like the LSF system used on lxplus at CERN. You can usebqsub to submit jobs (similar to LSF's bsub command). The full command can be specified on the command-line, so you don't need to create a "submit description file". Specify a time limit with bqsub -c hh:mm . eg. | ||||||||
Deleted: | ||||||||
< < | ||||||||
bqsub -c 24:00 Sherpa PTCUT:=20 EVENTS=1000 | ||||||||
Added: | ||||||||
> > |
You can also submit many jobs at once, eg. bqsub -n 10 ./myscript output_%%.root will run 10 jobs with arguments output_0.root or output_1.root etc (%% is replaced by the job number in each sub-job). | |||||||
qjobs (like LSF's bjobs ) lists your running jobs with more helpful information.
qpeek (like LSF's bpeek ) shows a running job's logfile (this may prompt for your password to ssh to the batch worker node).
Use bqsub -h , qjobs -h , or qpeek -h for help.
How to retrieve output files from a running jobSometimes it is useful to retrieve files from running jobs to check them or before killing a job which is not needed anymore. Here is a short guide:
|
Submitting a JobTo submit a jobs to the condor batch system you first need to write a "submit description file" to describe the job to the system: A very simple file would look like this: | ||||||||
Changed: | ||||||||
< < | ||||||||
> > | #################### | |||||||
Deleted: | ||||||||
< < | #################### | |||||||
# # Example 1 # Simple HTCondor submit description file # #################### Executable = myexe Log = myexe.log Input = inputfile Output = outputfile Queue | ||||||||
Changed: | ||||||||
< < | That runs myexe on the batch machine (after copying it and inputfile to a temporary directory on the machine) and copies back the standard output of the job to a file called outputfile | |||||||
> > | That runs myexe on the batch machine (after copying it and inputfile to a temporary directory on the machine) and copies back the standard output of the job to a file called outputfile | |||||||
A more complex example submit description would look like: | ||||||||
Changed: | ||||||||
< < | ||||||||
> > | #################### | |||||||
Deleted: | ||||||||
< < | #################### | |||||||
# # Example 2 # More Complex HTCondor submit description file # #################### Universe = vanilla Executable = my_analysis.sh Arguments = input-$(process).txt result/output-$(process).txt Log = log/my_analysis-$(Process).log Input = input/input-$(process).txt Output = output/my_analysis-$(Process).out Error = output/my_analysis-$(Process).err Request_memory = 2 GB Transfer_output_files = result/output-$(process).txt Transfer_output_remaps = "output-$(process).txt = results/output-$(process).txt" Notification = complete Notify_user = your.name@stfc.ac.uk Getenv = True Queue 20 | ||||||||
Changed: | ||||||||
< < | This submit runs 20 copies (Queue 20 ) of my_analysis.sh input-$(process).txt result/output-$(process).txt where $(process) is replaced by a number 0 to 19. It will copy my_analysis.sh and input-$(process).txt to each of the worker nodes (taking the input file from the local input directory). The standard output and error from the job are copied back to the local output directory at the end of the job and the file result/output-$(process).txt is copied back to the local Getenv = True ) and requests 2GB of memory to run in on the worker node. Finally it e-mails the user when each job completes. | |||||||
> > | This submit runs 20 copies (Queue 20 ) of my_analysis.sh input-$(process).txt result/output-$(process).txt where $(process) is replaced by a number 0 to 19. It will copy my_analysis.sh and input-$(process).txt to each of the worker nodes (taking the input file from the local input directory). The standard output and error from the job are copied back to the local output directory at the end of the job and the file result/output-$(process).txt is copied back to the local Getenv = True ) and requests 2GB of memory to run in on the worker node. Finally it e-mails the user when each job completes. | |||||||
Monitoring Your Jobs | ||||||||
Changed: | ||||||||
< < | The basic command for monitoring jobs is condor_q by default this only shows jobs submitted to the "schedd" (essentially submit node) you are using, to see all the jobs in the system run condor_q -global | |||||||
> > | The basic command for monitoring jobs is condor_q by default this only shows jobs submitted to the "schedd" (essentially submit node) you are using, to see all the jobs in the system run condor_q -global | |||||||
Changed: | ||||||||
< < | If jobs have been idle for a while you can use condor_q -analyze <job_id> to look at the resources requested by the job and how they match to the available resources on the cluster. | |||||||
> > | If jobs have been idle for a while you can use condor_q -analyze <job_id> to look at the resources requested by the job and how they match to the available resources on the cluster. | |||||||
Changed: | ||||||||
< < | Failed jobs often go into a "Held" state rather than disappearing, condor_q -held <jobid> will often give some information on why the job failed. | |||||||
> > | Failed jobs often go into a "Held" state rather than disappearing, condor_q -held <jobid> will often give some information on why the job failed. | |||||||
Changed: | ||||||||
< < | condor_userprio will give you an idea of the current usage and failshares on the cluster. | |||||||
> > | condor_userprio will give you an idea of the current usage and failshares on the cluster. | |||||||
Local CommandsThe PPD interactive machines have some local commands for making the Condor batch system a bit more convenient to use and more like the LSF system used on lxplus at CERN. You can usebqsub to submit jobs (similar to LSF's bsub command). The full command can be specified on the command-line, so you don't need to create a "submit description file". Specify a time limit with bqsub -c hh:mm . eg. | ||||||||
Changed: | ||||||||
< < | bqsub -c 24:00 Sherpa PTCUT:=20 EVENTS=1000 | |||||||
> > | bqsub -c 24:00 Sherpa PTCUT:=20 EVENTS=1000 | |||||||
Deleted: | ||||||||
< < | ||||||||
qjobs (like LSF's bjobs ) lists your running jobs with more helpful information.
qpeek (like LSF's bpeek ) shows a running job's logfile (this may prompt for your password to ssh to the batch worker node).
Use bqsub -h , qjobs -h , or qpeek -h for help. | ||||||||
Added: | ||||||||
> > |
How to retrieve output files from a running jobSometimes it is useful to retrieve files from running jobs to check them or before killing a job which is not needed anymore. Here is a short guide:
| |||||||
-- ChrisBrew - 2014-03-26 |
Submitting a JobTo submit a jobs to the condor batch system you first need to write a "submit description file" to describe the job to the system: A very simple file would look like this:#################### # # Example 1 # Simple HTCondor submit description file # #################### Executable = myexe Log = myexe.log Input = inputfile Output = outputfile QueueThat runs myexe on the batch machine (after copying it and inputfile to a temporary directory on the machine) and copies back the standard output of the job to a file called outputfile
A more complex example submit description would look like:
#################### # # Example 2 # More Complex HTCondor submit description file # #################### Universe = vanilla Executable = my_analysis.sh Arguments = input-$(process).txt result/output-$(process).txt Log = log/my_analysis-$(Process).log Input = input/input-$(process).txt Output = output/my_analysis-$(Process).out Error = output/my_analysis-$(Process).err Request_memory = 2 GB Transfer_output_files = result/output-$(process).txt Transfer_output_remaps = "output-$(process).txt = results/output-$(process).txt" Notification = complete Notify_user = your.name@stfc.ac.uk Getenv = True Queue 20This submit runs 20 copies ( Queue 20 ) of my_analysis.sh input-$(process).txt result/output-$(process).txt where $(process) is replaced by a number 0 to 19. It will copy my_analysis.sh and input-$(process).txt to each of the worker nodes (taking the input file from the local input directory). The standard output and error from the job are copied back to the local output directory at the end of the job and the file result/output-$(process).txt is copied back to the local Getenv = True ) and requests 2GB of memory to run in on the worker node. Finally it e-mails the user when each job completes.
Monitoring Your JobsThe basic command for monitoring jobs iscondor_q by default this only shows jobs submitted to the "schedd" (essentially submit node) you are using, to see all the jobs in the system run condor_q -global
If jobs have been idle for a while you can use condor_q -analyze <job_id> to look at the resources requested by the job and how they match to the available resources on the cluster.
Failed jobs often go into a "Held" state rather than disappearing, condor_q -held <jobid> will often give some information on why the job failed.
condor_userprio will give you an idea of the current usage and failshares on the cluster. | ||||||||
Added: | ||||||||
> > |
Local CommandsThe PPD interactive machines have some local commands for making the Condor batch system a bit more convenient to use and more like the LSF system used on lxplus at CERN. You can usebqsub to submit jobs (similar to LSF's bsub command). The full command can be specified on the command-line, so you don't need to create a "submit description file". Specify a time limit with bqsub -c hh:mm . eg.
bqsub -c 24:00 Sherpa PTCUT:=20 EVENTS=1000 qjobs (like LSF's bjobs ) lists your running jobs with more helpful information.
qpeek (like LSF's bpeek ) shows a running job's logfile (this may prompt for your password to ssh to the batch worker node).
Use bqsub -h , qjobs -h , or qpeek -h for help. | |||||||
-- ChrisBrew - 2014-03-26 |
Submitting a JobTo submit a jobs to the condor batch system you first need to write a "submit description file" to describe the job to the system: A very simple file would look like this:#################### # # Example 1 # Simple HTCondor submit description file # #################### Executable = myexe Log = myexe.log Input = inputfile Output = outputfile QueueThat runs myexe on the batch machine (after copying it and inputfile to a temporary directory on the machine) and copies back the standard output of the job to a file called outputfile
A more complex example submit description would look like:
#################### # # Example 2 # More Complex HTCondor submit description file # #################### Universe = vanilla Executable = my_analysis.sh Arguments = input-$(process).txt result/output-$(process).txt Log = log/my_analysis-$(Process).log Input = input/input-$(process).txt Output = output/my_analysis-$(Process).out Error = output/my_analysis-$(Process).err Request_memory = 2 GB Transfer_output_files = result/output-$(process).txt Transfer_output_remaps = "output-$(process).txt = results/output-$(process).txt" Notification = complete Notify_user = your.name@stfc.ac.uk Getenv = True Queue 20This submit runs 20 copies ( Queue 20 ) of my_analysis.sh input-$(process).txt result/output-$(process).txt where $(process) is replaced by a number 0 to 19. It will copy my_analysis.sh and input-$(process).txt to each of the worker nodes (taking the input file from the local input directory). The standard output and error from the job are copied back to the local output directory at the end of the job and the file result/output-$(process).txt is copied back to the local Getenv = True ) and requests 2GB of memory to run in on the worker node. Finally it e-mails the user when each job completes.
Monitoring Your JobsThe basic command for monitoring jobs iscondor_q by default this only shows jobs submitted to the "schedd" (essentially submit node) you are using, to see all the jobs in the system run condor_q -global
If jobs have been idle for a while you can use condor_q -analyze <job_id> to look at the resources requested by the job and how they match to the available resources on the cluster.
Failed jobs often go into a "Held" state rather than disappearing, condor_q -held <jobid> will often give some information on why the job failed.
condor_userprio will give you an idea of the current usage and failshares on the cluster.
-- ChrisBrew - 2014-03-26 |