The workflow file

The workflow file (usually named as workflow.xml) is the file where all the steps to execute and theirs parameters will be set. This file use the XML syntax and is divided in 3 sections :

  • The description section that contains information about the name, the description and the author of the workflow file.
  • The constants section that contains constants that can be used in parameters values.
  • The steps section that contain the list of the steps to execute and theirs parameters. The parameters of the built-in steps are described in the Built-in steps section.
  • The global section that contains global parameters (override configuration settings) that can be use in all the steps of the analysis.

In all parameter values you can use variables (e.g. ${variable}) that contains values for :

  • Built-in variables (${eoulsan.version}, ${eoulsan.build.number}, ${eoulsan.build.date}, ${design.file.path}, ${workflow.file.path}, ${output.path}, ${job.path}, ${job.id}, ${job.uuid} and ${available.processors})
  • java properties (e.g. ${java.version})
  • System environment variables (e.g. ${PATH}, ${PWD})
  • user defined constants

User can also insert in parameter values the output of a shell command with expression between "`":

	 <value>`cat /proc/cpuinfo | grep processor | wc -l`</value>
	 <value>`pwd`/tmp</value>
	 <value>`basedir ${user.home}`/tmp</value>

All the tags must be in lower case. The following source show the structure of a typical workflow.xml file:

<analysis>
    <formatversion>1.0</formatversion>
    <name>my analysis</name>
    <description>Demo analysis</description>
    <author>Laurent Jourdren</author>

    <constants>
        <parameter>
	        <name>my.constant</name>
	        <value>myconstantvalue</value>
        </parameter>
    </constants>


    <steps>

        <!-- Filter reads -->
        <step id="filterreads" skip="false">
                <name>filterreads</name>
                <parameters>
                        <parameter>
                                <name>trim.length.threshold</name>
                                <value>11</value>
                        </parameter>
                        <parameter>
                                <name>quality.threshold</name>
                                <value>12</value>
                        </parameter>
                </parameters>
        </step>

        <!-- Map reads -->
        <step id="mapreads" skip="false">
                <module>mapreads</module>
                <parameters>
                        <parameter>
                                <name>mapper</name>
                                <value>bowtie</value>
                        </parameter>
                        <parameter>
                                <name>mapper.arguments</name>
                                <value>--best -k 2</value>
                        </parameter>
                </parameters>
        </step>

        <!-- SAM filter -->
        <step id="filtersam"  skip="false">
                <module>filtersam</module>
                <parameters>
                        <parameter>
                                <name>removeunmapped</name>
                                <value></value>
                        </parameter>
                        <parameter>
                                <name>removemultimatches</name>
                                <value></value>
                        </parameter>
                </parameters>
        </step>

        <!-- Expression -->
        <step id="expression" skip="false">
                <module>expression</module>
                <parameters>
                        <parameter>
                                <name>counter</name>
                                <value>htseq-count</value>
                        </parameter>
                        <parameter>
                                <name>genomictype</name>
                                <value>gene</value>
                        </parameter>
                        <parameter>
                                <name>attributeid</name>
                                <value>ID</value>
                        </parameter>
                        <parameter>
                                <name>stranded</name>
                                <value>no</value>
                        </parameter>
                        <parameter>
                                <name>overlapmode</name>
                                <value>union</value>
                        </parameter>
                        <parameter>
                                <name>removeambiguouscases</name>
                                <value>true</value>
                        </parameter>
                </parameters>
        </step>

        <!-- Normalization -->
        <step id="normalization" skip="false">
                <module>normalization</module>
                <parameters/>
        </step>

        <!-- Diffana -->
        <step id="diffana" skip="false">
                <module>diffana</module>

                <parameters>
                        <parameter>
                                <name>disp.est.method</name>
                                <value>pooled</value>
                        </parameter>
                        <parameter>
                                <name>disp.est.sharing.mode</name>
                                <value>maximum</value>
                        </parameter>
                        <parameter>
                                <name>disp.est.fit.type</name>
                                <value>local</value>
                        </parameter>
                </parameters>
        </step>


    </steps>

    <globals>
        <parameter>
	        <name>main.tmp.dir</name>
	        <value>/tmp</value>
        </parameter>
    </globals>

</analysis>

Description section

The first tags of the workflow file allow to set some information about the file:

  • formatversion: The version of the format of this workflow file.
  • name: The name of this workflow file.
  • description: The description of this workflow file.
  • author: The author of this workflow file.

Constants section

The constant section allow to define additional variables that can be used in the values of the parameters with the ${variable} syntax. Previously defined constants (and other variables) can be used in a new constant.

Note that the constants section is optional.

    <constants>
        <parameter>
	        <name>my.constant1</name>
	        <value>foo</value>
        </parameter>
         <parameter>
	        <name>my.constant2</name>
	        <value>${my.constant1}-bar</value>
        </parameter>
    </constants>
 

Steps section

The steps section contains the list all the steps to execute. Each step has a name and parameters and optionnaly a version and inputs:

Tag Type Optional Description
module string False The name of the module to execute by the step
version string True The version of the step to use
inputs XML tags True Manually define the data sources to use by the step
parameters XML tags True The parameters of the step

The step tag can have 3 optional attributes:

Attribute Type Default value Description
id string The name of the module to execute This value define the identifier of the step. The id value must be unique in a workflow. The identifier is used to named output filenames of the step
discardoutput boolean false When this attribute is set to true, the output files of the step will be saved in the working directory instead of the output directory of the workflow and will be removed at the end of the workflow if successful
skip boolean false The skip attribute allow to skip a step if its value is set to true
requiredprocs integer -1 The requiredprocs attribute allow to set the number of processors to use by the step. This value is only used in clusterexec mode and for steps in local mode that handle their parallelization like the mapping step. By default in clusterexec mode, one processor will be used to process each step.
requiredmemory integer -1 The requiredmemory attribute allow to set the amount of memory required in megabytes by the step. This value is only used in clusterexec mode. If not set, Eoulsan will require to the cluster scheduler the same amount of memory allocated to Eoulsan JVM. Unit prefixes like MB, M, GB, G can be used for the required memory value (e.g. 8GB).
dataproduct string cross The dataproduct attribute allow to set the method to use for combining data before executing a step. By default a cross product is used. If you need that all the input data have the same name and must be executed together use match method instead.

If not set by the user, the Eoulsan workflow engine will take as data source for each input port the last previous step that generate data of the format that requested by the input port. If the user do not want to use the last source of data, it can manually define using input tags and its port, fromstep and fromport subtags.


    <steps>

        <!-- Filter reads -->
        <step id="myfilterreadstep" discardoutput="true"  skip="false" dataproduct="cross">
                <module>filterreads</module>
                <version>2.0-beta5</version>
                <parameters>
                        <parameter>
                                <name>trim.length.threshold</name>
                                <value>11</value>
                        </parameter>
                        <parameter>
                                <name>quality.threshold</name>
                                <value>12</value>
                        </parameter>
                </parameters>
        </step>

        <!-- Map reads -->
        <step id="mapping" skip="false" requiredprocs="4" requiredmemory="8GB" dataproduct="cross">
                <module>mapreads</module>
                <version>2.0-beta5</version>

                <inputs>
                        <input>
                                <port>reads</port>
                                <fromstep>myfilterreadstep</fromstep>
                                <fromport>output</fromport>
                        </input>
                </inputs>

                <parameters>
                        <parameter>
                                <name>mapper</name>
                                <value>soap</value>
                        </parameter>
                </parameters>
        </step>

        ...

    </steps>

Global parameter section

The global parameter section contains parameters that are shared by all the steps. The syntax of the global parameters is the same as in the steps.

    <globals>
        <parameter>
	        <name>main.tmp.dir</name>
	        <value>/home/jourdren/tmp</value>
        </parameter>
    </globals>

The global parameters override the values of the configuration file. For more information about the configuration file see the configuration file page.