Error when submitting to a queueing system on a cluster

Hi,
Ive managed to install haddock on my cluster, cns is there as well. I started running the example (protein-protein), and the thing is it get’s to the middle and then won’t go. The error Im getting is line 8: protein-protein_run1_it0_refine_6.job: command not found
however I do have that file within my run1 directory.
also the structures dir is created with subs it0, it1, however there are no structures.
I submitted the job in a queue system with 10 nodes - does that have anything in common with this error?
I dont know what details would help to answer this question, so if you could specify what sort of information you need to answer this question I’ll supply everything I know.

What are your sending to your queueing system on your cluster?
And how?

You should submit the full haddock process, but rather define the queue submission command in run.cns

How do the following lines in run.cns look like in your case?

{============================ parallel jobs ===============================}
{* How many nodes do you want to use in parallel? *}
{* leave unused fields blank, make sure that the queues are actually running *}
{+ table: rows=10 "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
 cols=3 "queue command" "cns executable" "number of jobs" +}

{===>} queue_1="ssub short";
{===>} cns_exe_1="/home/software/software/cns_solve_1.31-UU/intel-x86_64bit-linux/bin/cns";
{===>} cpunumber_1=400;
...

Also the run directory where you run haddock should be accessible on all nodes.

well, I just get into the run1 direcory and type haddock2.2 as the instruction says.
this are the lines from run.cns

1767 {============================ parallel jobs ===============================}
1768 {* How many nodes do you want to use in parallel? }
1769 {
leave unused fields blank, make sure that the queues are actually running *}
1770 {+ table: rows=10 “1” “2” “3” “4” “5” “6” “7” “8” “9” “10”
1771 cols=3 “queue command” “cns executable” “number of jobs” +}
1772
1773 {===>} queue_1=“bsub -e err -o ooo”;
1774 {===>} cns_exe_1="/nfs/research2/beltrao/strumill/programs/cns_solve_1.3/intel-x86_64bit-linux/bin/cns";
1775 {===>} cpunumber_1=10;

out the output says Im sending jobs like:
54
55 Your job looked like:
56
57 ------------------------------------------------------------
58 # LSBATCH: User input
59 ./protein-protein_run1_generate_B.job
60 ------------------------------------------------------------
61
62 Successfully completed.
63

but when it comes to protein-protein_run1_it0_refine_6.job

it’s:
139 Your job looked like:
140
141 ------------------------------------------------------------
142 # LSBATCH: User input
143 protein-protein_run1_it0_refine_6.job
144 ------------------------------------------------------------
145
146 Exited with exit code 127.
147
and err is/ebi/lsf/ebi-spool/02/1460643054.1414999: line 8: protein-protein_run1_it0_refine_6.job: command not found

It looks like your current dir in not in your path. But if this only happens for job #6 and the first five were fine, it is suspect and could be related to a problem with a specific node.

What you could try is to use instead full path names. For this edit in your haddock installation the following file:

 Haddock/Main/UseLongFileNames.py

And change the last line to:

useLongJobFileNames = 1

This should work assuming the path is the same on the master node from which you submit as on the compute nodes.

I dont want to jinx it, but looks like the uselongfilenames worked!
Thank you!

great, it definitely works! I guess it’s becasue some of the nodes were recognizing the path and some needed it full, just like you said! thank you so much for your help even if I couldnt describe easily what’s the problem, I wouldn’t have found the UseLongFileNames.py file myself!

hi, I have another issue, that I dont know where is coming from. Everything was running for a day or two and then I got this error.

File “/nfs/research2/beltrao/strumill/example_haddock/protein-protein/run1/tools/make_contacts.py”, line 82
print “Path not found: %s” %executable
^
SyntaxError: invalid syntax
mv: No match.

The lines in the make_contacts.py dont tell me anything on how solve that. Could I ask for your help?

80 executable = os.path.abspath(options.executable)
81 if not os.path.exists(executable):
82 print “Path not found: %s” %executable
83 sys.exit(1)

This script performs the clustering of solutions. Did you compiled the provided software when installing HADDOCK? I.e. did you type make in the haddock installation directory?

yes, I also have additional software required installed, with the right paths in the file that I sourced.

this is the output of my make (when re-make)

cd tools;make
make[1]: Entering directory ‘/nfs/research2/beltrao/strumill/programs/haddock2.2/tools’
make cluster_struc contact contact-chainID haddock-decompress-fastfunc contact_fcc contact_fcc_lig
make[2]: Entering directory ‘/nfs/research2/beltrao/strumill/programs/haddock2.2/tools’
make[2]: ‘cluster_struc’ is up to date.
make[2]: ‘contact’ is up to date.
make[2]: ‘contact-chainID’ is up to date.
g++ -O2 -o haddock-decompress-fastfunc haddock-decompress-fastfunc.cpp
haddock-decompress-fastfunc.cpp: In function ‘int main(int, char**)’:
haddock-decompress-fastfunc.cpp:33:28: warning: ignoring return value of ‘char* fgets(char*, int, FILE*)’, declared with attribute warn_unused_result [-Wunused-result]
fgets(buf, 2000, matfile);
^
haddock-decompress-fastfunc.cpp:38:30: warning: ignoring return value of ‘char* fgets(char*, int, FILE*)’, declared with attribute warn_unused_result [-Wunused-result]
fgets(buf, 2000, matfile);
^
make[2]: ‘contact_fcc’ is up to date.
g++ -O2 -o contact_fcc_lig contact_fcc_lig.cpp
make[2]: Leaving directory ‘/nfs/research2/beltrao/strumill/programs/haddock2.2/tools’
make[1]: Leaving directory ‘/nfs/research2/beltrao/strumill/programs/haddock2.2/tools’

The script is looking for the contact_fcc executable which should be in the tools directory (of your run and of the HADDOCK installation).

You can try simply restarting HADDOCK and see if the error persists. Check that contact_fcc is present in the above dirs and that HADDOCKTOOLS is correctly defined (in the haddock_configure scripts).

after running two other jobs, this error persists. I do have contact_fcc file in run1/tools in my project as well as in haddock2.2/tools.

the HADDOCKTOOLS path in haddock_configure is also definded correctly:
HADDOCKTOOLS="$HADDOCK/tools"
this is the error again:
File “/nfs/research2/beltrao/strumill/inactive_AKT1/run1/tools/make_contacts.py”, line 82
print “Path not found: %s” %executable
^
SyntaxError: invalid syntax
mv: No match.
File “/nfs/research2/beltrao/strumill/inactive_AKT1/run1/tools/make_contacts.py”, line 82
print “Path not found: %s” %executable
^
SyntaxError: invalid syntax
mv: No match.

Did it work for another run?
Do you have the contact application in the tools directory?
What is echo $HADDOCKTOOLS giving?

Again make sure to run make when installing HADDOCK.

no it didnt for the previous, but I assumed I’ll start a new one and it will work out :smiley:
I did make haddock

this is tools/cont*
contact contact2.cpp contact-chainID contact-chainID.cpp contact.cpp contact_fcc contact_fcc.cpp contact_fcc_lig contact_fcc_lig.cpp

echo $HADDOCKTOOLS
/nfs/research2/beltrao/strumill/programs/haddock2.2/tools

SOLVED

the issue is the python version - definition in configure file is not enough.
It is neccessary to make sure that the python aliases and python paths are directing to python2.7 -> higher versions of python are not compatible.