Map reads module

This module allow to map reads with a mapper program.

  • Internal name: mapreads
  • Available: Both local and distributed mode

  • Input ports:
    • reads: reads in FASTQ format (format: reads_fastq)
    • mapperindex: mapper index for the genome (automatically generated from genome sequence file)
    • genomedescription: genome description (automatically generated from genome sequence file)

  • Output port:
    • output: alignments in FASTQ format (format: mapper_results_sam)

  • Optional parameters:
  • Parameter Type Description
    mapper string The name of the mapper to use (e.g. bowtie, bwa, soap, gsnap)

  • Optional parameters:
  • Parameter Type Description Default value
    mapper.version string Define the version of the mapper to use See below in the next table
    mapper.flavor string Define the flavor of the mapper to use (e.g. standard version or large index version) See below in the next table
    mapper.use.bundled.binares boolean Use the mapper bundled binaries in Eouslan to perform the mapping. If the value is false, the mapper will be search in the PATH environnment variable True
    local.threads integer Define the number of threads to use in local mode. 0 (use the main.local.threads global property)
    max.local.threads integer Define the maximum number of threads to use in local mode. 0 (no limit)
    hadoop.threads integer Define the maximum number of threads to use in hadoop mode. 0 (the number of available processors)
    mapper.arguments string Mapper additional command line arguments See below in the next table
    hadoop.reducer.task.count integer The count of Hadoop reducer tasks to use for this step. This parameter is only used in Hadoop mode Not set

  • Available built-in mappers:
  • Mapper name Bundled versions Default version Flavors Default Flavor Default mapper arguments
    Bowtie 0.12.9, 1.1.1 0.12.9 standard, large-index standard --best -k 2
    Bowtie2 2.0.6, 2.2.4 2.0.6 standard, large-index standard -k 2
    BWA 0.6.2, 0.7.12, 0.7.15 0.6.2 aln, mem aln -l 28
    GSNAP 2012-07-20, 2014-12-21, 2017-02-25, 2017-04-24 2012-07-20 gsnap, gmap gsnap -N 1
    STAR 2.4.0k, 2.5.2b, 2.6.1b, 2.7.2d, 2.7.8a 2.7.2d standard, large-index standard --outSAMunmapped Within
    Minimap2 2.5, 2.10, 2.12, 2.17, 2.18, 2.24 2.17 standard standard

  • Removed built-in mappers:
    • SOAP 2.x

  • Configuration example:
  • <!-- Map reads step -->
    <step id="mymappingstep" skip="false" discardoutput="true">
    	<module>mapreads</module>
    	<parameters>
    		<parameter>
    			<name>mapper</name>
    			<value>star</value>
    		</parameter>
    		<parameter>
    			<name>mapper.version</name>
    			<value>2.4.0k</value>
    		</parameter>
    		<parameter>
    			<name>mapper.flavor</name>
    			<value>standard</value>
    		</parameter>
    		<parameter>
    			<name>mapper.arguments</name>
    			<value>--outSAMunmapped Within</value>
    		</parameter>
    	</parameters>
    </step>
    

Note 1: In hadoop mode, Eoulsan use the mapreduce.cluster.temp.dir hadoop setting as location for the mapping temporary files (mapper indexes and FASTQ temporary files). This path must be set in the Hadoop client properties.

Note 2: In hadoop mode, use the requiredMemory attribute of the step tag to define the amount of memory required by the mapper in Hadoop mode. By default the value is 8 GB.

Warning: In hadoop mode, the results of BWA are not exactly the same as in local mode because with BWA the SAM output of splitted FASTQ files are not the same as the full FASTQ files (See this BioStar post for more information).