I want to run haddock2.2 in high performance cluster with The Simple Linux Utility for Resource Management (SLURM) as scheduling system. From the introduction of the cluster, I need create SLURM script as following:
#!/bin/bash
NOTE: Lines starting with “#SBATCH” are valid SLURM commands or statements,
while those starting with “#” and “##SBATCH” are comments. Uncomment
“##SBATCH” line means to remove one # and start with #SBATCH to be a
SLURM command or statement.
#SBATCH -J slurm_job #Slurm job name
Set the maximum runtime, uncomment if you need it
##SBATCH -t 48:00:00 #Maximum runtime of 48 hours
Enable email notificaitons when job begins and ends, uncomment if you need it
##SBATCH --mail-user=user_name@ust.hk #Update your email address
##SBATCH --mail-type=begin
##SBATCH --mail-type=end
Choose partition (queue), for example, partition “standard”
#SBATCH -p standard
Use 2 nodes and 48 cores
#SBATCH -N 2 -n 48
Setup runtime environment if necessary
For example, setup MPI environment
source /usr/local/setup/pgicdk-15.10.sh
or you can source ~/.bashrc or ~/.bash_profile
Go to the job submission directory and run your application
cd $HOME/apps/slurm
mpirun ./your_mpi_application
Now I am confused by the “queue command” in haddock and the “MPI”. If I just create and run the SLURM script with “haddock2.2” command and use “csh” as queue command in “run.cns”, will the SLURM system give me all of the nodes and cpu cores which I request in “run.cns”?
Or I need use SLURM script as queue command in run.cns?
The queue setting in run.cns is used to launch the jobs created by HADDOCK. In short, you should replace csh by srun or sbatch, depending on your SLURM configuration, and with the appropriate flags.
First way would indeed be then csh - meaning your are sending the entire haddock process to the batch system - BUT, probably only works fine provide you are using only one node (my guess - never tested slurm)
As Alex said, you can request ONE node interactively via srun/sdev and then run HADDOCK with csh and the ncpu setting to the number of cores you requested. This is probably not efficient, but it’s simple.
To make full use of SLURM you’d have to edit the Queue.py file in Haddock/Main/ and add the proper SLURM headers and then put sbatch in the queue command. More complicated, but more efficient and practical in the long term.
Hi all
back after ages to a great software
I have successfully installed it to our new cluster (cns compilation was beautifully painless), which has SLURM as queue system and 20 nodes with 32 cores each node.
I did set QUEUESUB=QueueSubmit_slurm.py
and I edit Haddock/Main/MHaddock.py to #values for running via a batch system
jobconcat[“0”] = 5
jobconcat[“1”] = 2
jobconcat[“2”] = 2
as suggested in /Haddock/Main/QueueSubmit_slurm.py:
HADDOCK2.4 runs fine and I have 10 jobs R and 192 PD (queue).
Now before the sysadmin complain about this long queue, is there any chance to submit a single job as wrapper similar to HADDOCK2.4 manual - Frequently Asked Questions – Bonvin Lab?
I tried to edit ssub but it did not work.
I am trying examples/protein-protein in HAddock dir.
Hi Alexandre
so you are suggesting to put
{===>} queue_1=“csh”;
{===>} cns_exe_1="/SOFTW/DOCKING/HADDOCK/cns_solve_1.3/intel-x86_64bit-linux/bin/cns";
{===>} cpunumber_1=32;
and then create a wrap SLURM script in order to submit the whole job in a single node, i.e.
#SBATCH --job-name=HADDOCK #SBATCH --partition=workq #SBATCH --account spitaleri.andrea #SBATCH --mem=60GB # amout of RAM in MB required (and max ram available). Xor “–mem-per-cpu” #SBATCH --time=INFINITE ## OR #SBATCH --time=10:00 means 10 minutes OR --time=01:00:00 means 1 hour #SBATCH --cpus-per-task=32 #SBATCH --nodes=1 # not really useful for not mpi jobs #SBATCH --mail-type=ALL ## BEGIN, END, FAIL or ALL #SBATCH --error=“err” #SBATCH --output=“out”
so you are suggesting to put
{===>} queue_1=“csh”;
{===>} cns_exe_1=“/SOFTW/DOCKING/HADDOCK/cns_solve_1.3/intel-x86_64bit-linux/bin/cns”;
{===>} cpunumber_1=32;
Yes
and then create a wrap SLURM script in order to submit the whole job in a single node, i.e.
#SBATCH --job-name=HADDOCK #SBATCH --partition=workq #SBATCH --account spitaleri.andrea #SBATCH --mem=60GB # amout of RAM in MB required (and max ram available). Xor “–mem-per-cpu” #SBATCH --time=INFINITE ## OR #SBATCH --time=10:00 means 10 minutes OR --time=01:00:00 means 1 hour #SBATCH --cpus-per-task=32 #SBATCH --nodes=1 # not really useful for not mpi jobs #SBATCH --mail-type=ALL ## BEGIN, END, FAIL or ALL #SBATCH --error=“err” #SBATCH --output=“out”
No I would put all values to 1 in that case.
You will only have one job in the queue.
Also you could consider creating a tmp dir under /tmp on the node (if space allows), move the full run there and run locally (path should then be relative to ./) in the run.cns file, then move the completed run back to your home dir. In that way you would minimise network traffic. This is kind of what the pilot mechanism is doing.
Note that when having HADDOCK sends jobs to the queue using slurm we use the QueueSubmit_concat.py mechanism - if you set the cpunumber to say 50 you won’t fill the queue that much. And you run will still proceed quite fast.
Now, I am trying to use 2 nodes using Slurm:
I have looked into the QueueSubmit_slurm.py file and linked it to the QueueSubmit.py, as suggested in earlier postings. the new run.cns was modified to contain:
Don’t use the QueueSubmit_slurm.py file! We should actually remove it. Old stuff. Use the concat script.
The way we run it on our cluster using slurm is to have haddock started on the master node and submitting jobs to slurm via the queue_1 command defined in run.cns.
This could be simply sbatch , possibly with some additional options.
I.e. we are not submitting the entire HADDOCK process to the queue.
This can be done as well, but then it can only run within the node it has been assigned (no MPI support).
And in that case the queue command in run.cns would simply be /bin/csh and setting the cpunumber to the number of core available/requested for the node.
For our system, we have a wrapper script that creates and submit a job file. This is what we are defining under queue_1 and setting the cpunumber to the number of concurrent jobs you want to have in the queue. We have different queue lengths that are reflected in the wrapper scripts. Within run.cns we thus defined queue_1 = submit_slurm short for targeting the short queue.
#!/bin/csh -f
# syntax highlighting : alt-x global-font-lock-mode
if ($# < 2) then
echo "Usage : submit-slurm queue jobname"
echo "queue : valid queue destination are : short | medium | long | haddock | verylong"
exit 1
endif
# check queue
set queue=$1
if ($queue != "short" && $queue != "medium" && $queue != "long" && $queue != "verylong" && $queue != "minoes" && $queue != "haddock" ) then
echo "Wrong queue destination"
exit 1
endif
# define time limit based on queue
if ($queue == short ) then
set timelimit=240
endif
if ($queue == medium ) then
set timelimit=720
endif
if ($queue == long ) then
set timelimit=1440
endif
if ($queue == verylong ) then
set timelimit=7200
endif
if ($queue == haddock ) then
set timelimit=720
endif
# check if job exists + make it executable
set jobname=$2
if (! -e $2) then
echo "job file does not exist"
exit 1
endif
if (! -x $jobname) chmod +x $jobname
# write slurm script
set slurmjob=$jobname.slurmjob.$$
if (-e $slurmjob) then
\rm $slurmjob
endif
touch $slurmjob
set PWD=`pwd`
if ($jobname =~ *rmsd*.job) then
set queue=haddock
endif
if ($jobname =~ *ene-residue*.job) then
set queue=verylong
endif
echo "#!"$SHELL >> $slurmjob
set outfile=$PWD/$jobname.out.$$ >> $slurmjob
echo "#SBATCH --output="$outfile >> $slurmjob
set errorfile=$PWD/$jobname.err.$$ >> $slurmjob
echo "#SBATCH --partition=$queue" >> $slurmjob
echo "#SBATCH --time="$timelimit >> $slurmjob
echo "#SBATCH --cpus-per-task=1" >> $slurmjob
echo "#SBATCH --threads-per-core=1" >> $slurmjob
echo "cd "$PWD >> $slurmjob
echo "./"$jobname >>$slurmjob
chmod +x $slurmjob
sbatch $slurmjob
set success=$?
\rm $slurmjob
exit $success