Musings on GROMACS and workflow description integration

I was intrigued by possibilities prompted Michael Crusoe’s Common Workflow Language talk. GROMACS has long had workflow-like support built into its default file-naming scheme, so that a “workflow” can be written as a series of shell commands like

gmx pdb2gmx -f thing_from_PDB_database -ff amber03
gmx editconf -box 10 -o boxed
gmx solvate -cp boxed -o solvated
gmx grompp -c solvated
gmx mdrun

Some of the connections there are made behind the scenes through default file naming (e.g. grompp’s primary output has a default file name that matches that for the mdrun’s primary input), and some I had to create by naming files explictly.

Our command-line tools option-handling implementation already supports automatically creating command-line shell completion files (example horror except below!), so that users can type gmx pd[TAB][TAB and have pdb2gmx filled in and be prompted about the available command-line flags. That’s cool, but I bet someone could adapt that to write CWL bindings (is that the right word?) so that it was easy for users to express such workflows in something other than a bash script. This would be very powerful for doing parameter scans, ensemble studies, etc. Or transferring to another implementation platform (local cluster, vs grid engine, vs whatever).

Bash command-line completion example for gmx pdb2gmx:
_gmx_pdb2gmx_compl() { local IFS=$'\n' local c=${COMP_WORDS[COMP_CWORD]} local n for ((n=1;n<COMP_CWORD;++n)) ; do [[ "${COMP_WORDS[COMP_CWORD-n]}" == -* ]] && break ; done local p=${COMP_WORDS[COMP_CWORD-n]} COMPREPLY=() if (( $COMP_CWORD <= 1 )) || [[ $c == -* ]]; then COMPREPLY=( $(compgen -S ' ' -W $'-f\n-o\n-p\n-i\n-n\n-q\n-chainsep\n-merge\n-ff\n-water\n-inter\n-ss\n-ter\n-lys\n-arg\n-asp\n-glu\n-gln\n-his\n-angle\n-dist\n-una\n-ignh\n-missing\n-v\n-posrefc\n-vsite\n-heavyh\n-deuterate\n-nochargegrp\n-nocmap\n-renum\n-rtpres' -- $c)); return 0; fi case "$p" in -f) (( $n <= 1 )) && COMPREPLY=( $(compgen -S ' ' -X '!*@(.gro|.g96|.pdb|.brk|.ent|.esp|.tpr)?(.gz|.Z)' -f -- $c ; compgen -S '/' -d $c));; -o) (( $n <= 1 )) && COMPREPLY=( $(compgen -S ' ' -X '!*@(.gro|.g96|.pdb|.brk|.ent|.esp)?(.gz|.Z)' -f -- $c ; compgen -S '/' -d $c));; -p) (( $n <= 1 )) && COMPREPLY=( $(compgen -S ' ' -X '!*.top?(.gz|.Z)' -f -- $c ; compgen -S '/' -d $c));; -i) (( $n <= 1 )) && COMPREPLY=( $(compgen -S ' ' -X '!*.itp?(.gz|.Z)' -f -- $c ; compgen -S '/' -d $c));; -n) (( $n <= 1 )) && COMPREPLY=( $(compgen -S ' ' -X '!*.ndx?(.gz|.Z)' -f -- $c ; compgen -S '/' -d $c));; -q) (( $n <= 1 )) && COMPREPLY=( $(compgen -S ' ' -X '!*@(.gro|.g96|.pdb|.brk|.ent|.esp)?(.gz|.Z)' -f -- $c ; compgen -S '/' -d $c));; esac }

There is already an initital implementation of such ideas within BioExcel, buils as a library of interoperable software modules (https://github.com/bioexcel/pymdsetup). This has been enacted using PyCOMPS, but can be easily ported.

Discussion with Michael suggests there’s no C++ library yet that generates CWL bindings programmatically, but it would be cool to build one

Cool. I can see that https://github.com/bioexcel/pymdsetup/blob/master/workflows/gromacs_full.py is doing much the same thing as my example here, using wrappers such as https://github.com/bioexcel/pymdsetup/blob/master/gromacs_wrapper/pdb2gmx.py. Lemme think some more :slight_smile:

Would it perhaps make sense for wrappers like pdb2gmx.py to be auto-generated from a CWL tool description?

Somewhat related (but different granularity) - running cwltools from Jupyter iPython Notebook:

import cwltool.factory
f = cwltool.factory.Factory()
echo = f.make("v1.0/v1.0/echo-tool.cwl")
out = echo(in="foo")