Expression module

This module allow to compute expression using the filtered alignments and a annotation file that contains genomic elements to count. Eoulsan currently provide one counting method: htseq-count (the original Eoulsan counting method is now deprecated). This method is a fast the implementation of htseq-count. For more information about this method see HTSeq website.

WARNING: The support of the original Eoulsan counter is now deprecated and will be soon removed from Eoulsan.

  • Name: expression.
  • Available: Both local and distributed mode

  • Input port:
    • alignments: alignments in SAM format (format: mapper_results_sam)
    • featuresannotation: genome annotation in GFF3 or GTF format
    • genomedescription: genome description (automatically generated from genome sequence file)

  • Output port:
    • output: expression file in TSV format (format: expression_results_tsv)

  • Optional parameters:
  • Parameter Type Description Default value
    counter string The name of the counter to use (eoulsanCounter or htseq-count). The support of eoulsanCounter has been removed from Eoulsan 2.x. htseq-count
    features.file.format string The features file format. Currently only GFF/GFF3 and GTF format are supported. gff3
    output.file.format string The output file format. Currently only TSV and SAM format are supported. If SAM format selected, each SAM entry will have its feature assignment (as an optional field with tag 'XF') tsv
    genomic.type string feature type (3rd column in GFF file) to be used, all features of other type are ignored. exon
    attribute.id string GFF attribute to be used as feature ID PARENT

  • Optional parameters of HTSeq-count:
  • skip all reads with alignment quality lower than the given minimal value
    Parameter Type Description Default value
    stranded string If "reverse", the read has to be mapped to the opposite strand as the feature. If "yes" or "no", the read has to be mapped to the same strand as the feature or it is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. no
    overlap.mode string Name of the overlap mode to use (union, intersection-nonempty or intersection-strict). union
    minimum.alignment.quality integer .0
    remove.ambiguous.cases boolean Keep or remove ambiguous cases in the count true
    remove.non.unique.alignments boolean Keep or remove non unique alignments for the count (use the NH tag of the optional fields of the SAM entries) true
    remove.secondary.alignments boolean Remove alignments with the secondary alignment flag for the count false
    remove.supplementary.alignments boolean Remove alignments with the supplementary alignment flag for the count false
    remove.non.assigned.sam.tags boolean Do not add a SAM tag for non assigned SAM entries false
    sam.tag.to.use string Name of the SAM tag to use for the assigned features. Value must be X?, Y? or Z? where ? is a letter XF
    split.attribute.values boolean Split values of the attribute field false
    max.entries.in.ram integer The maximal number of SAM output entries to store in memory. By lowering the value of this parameter out of memory errors can be avoided for long reads. 500000
  • Configuration example:
  • <!-- Expression step -->
    <step id="myexpressionstep" skip="false" discardoutput="false">
    	<module>expression</module>
    	<parameters>
    		<parameter>
    			<name>counter</name>
    			<value>htseq-count</value>
    		</parameter>
    		<parameter>
    			<name>features.file.format</name>
    			<value>gff3</value>
    		</parameter>
    		<parameter>
    			<name>genomic.type</name>
    			<value>exon</value>
    		</parameter>
    		<parameter>
    			<name>attribute.id</name>
    			<value>Parent</value>
    		</parameter>
    		<parameter>
    			<name>stranded</name>
    			<value>yes</value>
    		</parameter>
    		<parameter>
    			<name>overlap.mode</name>
    			<value>union</value>
    		</parameter>
    		<parameter>
    			<name>remove.ambiguous.cases</name>
    			<value>false</value>
    		</parameter>
    	</parameters>
    </step>
    

Note: To get the same results as the original implementation of htseq-count with the --non-unique all argument, you must to set to false the remove.ambiguous.cases and remove.non.unique.alignments parameters.