Batch Jobs: Long Running computations
If a program has to run for a few hours or more, it should be prepared as a batch job and submitted to a cluster queue. This is the only feasible, efficient way that a relatively large number of users in the campus can share a large computing resource like the HPC cluster.
Here is the gist of it:
The user needs to prepare the long running program (say, a script written in R, Mplus, Stata, or SAS) and a "submission script". The submission script is the program that we use to ask the cluster scheduler to find some available compute nodes and send the job to the those nodes. The simplest kind of chore is simply to launch a single, long running job. Some jobs, however, are more interesting because they are parallel, meaning they divide up their work among several compute nodes and collect the results when they are finished. Parallel computing is the essence of high performance computing. We use that to run computer simulations or do massively parallel computations.
The big jobs we have been running fall into two groups.
- Lots of component jobs that run separately can be dispatched across many compute nodes by separate scripts. A simulation exercise may require thousands of repetitions, but they are separate from each other. We may write a shell script that creates hundreds or thousands of separate programs and program submission scripts. A job that can be split into many completely separate parts is said to be embarrassingly parallel. Its embarrassing because it is so easy.
- A job is truly parallel (that is, not embarrassing) if there is a main program that has computations done on several "threads." It assigns separate calculations to many compute nodes or cores and these threads in some sense need to communicate with each other. This kind of program is more difficult to prepare because one has to be cautious about making sure the different nodes are aware of what they ought to do, but it is also the most rewarding kind. If a master program is used to initiate all of the separate pieces, the results may be more believable to some computer scientists.
Two Vital Elements
- A submission script
- A program to be submitted by the submission script.
Here is an example submission script. This one is aimed to submit just one long-running R program.
The symbol "#SBATCH" is a declaration that the scheduler is supposed to notice. While running, the job's name is "RparallelHelloWorld", that's how we can spot it while running. This job is a one-core job, and only uses one processor, so we request exactly that amount. The --mail-user argument is your email address, and --mail-type "BEGIN,END,FAIL" means to email you when (BEGIN) the job begins, when (END) it ends successfully, or (FAIL) if it fails, or aborts. The --partition argument specifies which partition the job is submitted to. Specifying sixhour allows a small job that runs in six hours or less to be sent outside of the CRMDA nodes. If crmda is supplied as --partition, then the job will be restricted to CRMDA nodes. If you are unsure what partitions you have permission to submit to, you can run "$ mystats" to get information about what partitions you can use and which one is the default.
mpiexec -n 1 R --vanilla -f parallel-hello.R
As one can see, there is a "boilerplate-ish" feeling in this script, about the only thing the user would worry about is the time allowed. If we choose a number too small, the job will be canceled by the scheduler before it is done. If we ask for a lot of time, the scheduler may make us wait until the cluster is not full of other jobs.
There is a separate file, "parallel-hello.R", in the same directory as the submission script.
By default, each node has 2G of memory. For jobs that demand more memory the user can specify the total job memory requirement. In the example below the line: "#SBATCH --mem=44G" sets the total job working memory at 44G, which is twice default of 22G. Specifying more memory than is required for your job is a waste of resources, and it can cause your job to spend a longer period of time in the queue.
sbatch: Submit A Job
To submit the batch job, run this command:
$ sbatch sub-serial.sh The submission number for your job will display in the console Submitted job 749
It is running in the "background". While the job runs, we can log off of HPC entirely, it will keep going.
When the job finishes, it creates 2 files,
1. Output file: RParallelHelloWorld.o749
2. Error file: RParallelHelloWorld.e749
If everything went well, the error file might be empty, or it might have a harmless comment or warning. Of course, as is usually the case with R, we might have asked the program to create some graphics or data files, and they should be available as well.
squeue, scancel: Check, and Delete Batch Jobs
Did the job run yet? Is somebody else running too many jobs and clogging up the queue?
Check cluster status with squeue
To check the status of the job, we run the command "squeue" (this is similar to the old "showq" command). This will produce three tables, the first will be a list of the active jobs. For example:
See https://crc.ku.edu/hpc/slurm/how-to for more information on the squeue command and its arguments.
Remove requests with scancel
If you decide you need to kill a job, run "scancel" with the job number.
$ scancel 749
job '749' cancelled
To delete several jobs, you can use just one command, such as:
$ scancel 710 711 712
job '710' cancelled
job '711' cancelled
job '712' cancelled
Perhaps that becomes tedious if you need to remove 100s of jobs you piled onto the queue by mistake.
We asked if there is a way to speed up the removal of a lot of jobs. The ITTC support staff offered a helpful answer:
for i in $(seq 1 1000); do scancel $i; done
That deletes the jobs numbered 1 to 1000.