Quickstart Guide

Installation

To install Eoulsan, go to the Eoulsan installation page and follow the detail procedure.

Sample files

We provide in this section some samples files to test Eoulsan. These files have been produced during a mouse RNASeq experiment.

Create a design file

In an empty directory, copy the reads, genome and annotation files, then you can create a design file with the next command:

$ eoulsan.sh createdesign *.fq.bz2 genome.fasta.bz2 annotation.gff.bz2

You can now modify the design file to add additional information. Note that Eoulsan handle automatically compressed files.

Create workflow file

To create a workflow file, the best solution is to reuse an existing workflow file (see sample section) and adapt it to your needs.

The workflow file contains the list of the steps that will be executed by Eoulsan. Each step have parameters and it is related to a module to execute. Some step modules are only available in local mode (like differential analysis) or in distributed mode, see the modules page for more details. For each step you can change, add or remove parameters. Parameters are specific to a module, consult the documentation of the built-in steps for the list of available parameters of each step.

At least, there is a global section in the workflow file that override the values of Eoulsan configuration file. This section is useful to set for example the path of the temporary directory to use.

Launch Eoulsan in local mode

Once your design file and workflow file are ready, you can launch Eoulsan analysis with the following command:

$ eoulsan.sh exec workflow-local.xml design.txt

Warning: To perform the normalization and differential analysis steps of this workflow, this demo requires Docker. If you want to run this demo without Docker, you must must install R (or a RServe server) and the related packages (See differential analysis step for more information) and change the execution.mode parameter of the normalization and diffana steps.

Once started and before starting analysis, Eoulsan will check if:

  • The design file and workflow file are valid
  • All the modules related to the steps exist
  • The workflow of all the steps are valid
  • The order of the steps is correct (a step can not use data generated after its end)
  • The input files are valid (fasta and annotation files)
  • A genome mapper index is needed. In this case, a step to generate this index is automatically added

If successful, you wil obtain a new directory name like eoulsan-20110310-101235 with log files about the analysis. Results files are stored in the current directory of the user.

Launch Eoulsan in local hadoop cluster mode

First, you must have a configurated Hadoop cluster (see hadoop configuration). You can launch Eoulsan analysis with the following command:

$ eoulsan.sh hadoopexec workflow-hadoop.xml design.txt hdfs://master.example.com/test

When a step can be distributed on the Hadoop cluster, required input files are automatically copied on the HDFS filesystem before launching the step.

Launch Eoulsan in cluster mode

The cluster mode works like the local mode, you just need to configure before the cluster scheduler to use (see the cluster configuration page). Then, you can launch an Eoulsan analysis on a cluster with the following command:

$ eoulsan.sh clusterexec workflow-local.xml design.txt

Step tasks will be automaticaly submitted to the scheduler of your cluster. The outputs of this mode are the same as in local mode.

Launch Eoulsan on Amazon Web Service cloud computing

Warning: Eoulsan 2.x do not currently work on AWS. This section is outdated.

You must have an AWS account for Amazon S3 and Amazon MapReduce to launch Eoulsan on Amazon cloud. You can register to Amazon Web Services here:

Then you must set Amazon credentials in a Eoulsan configuration file or in the workflow file (see AWS configuration). Here an example of ~/.eoulsan configuration file.

# This is an example of configuration file for Eoulsan.
# You need to use the -conf parameter or rename this file to
# $HOME/.eoulsan to enable it.

# Temporary directory.
# By default Eoulsan use the temporary directory for your platform.
#main.tmp.dir=/tmp

# Debug mode.
# By default in Eoulsan the debug mode is disable.
main.debug=false


# Warning: The following Amazon Web Services access and secret keys are 
# fake keys. You MUST replace them by valid keys.

# Access Key ID (a 20-character, alphanumeric sequence)
aws.access.key=022QF06E7MXBSH9DHM02
# AWS Secret Access Key (a 40-character sequence)
aws.secret.key=kWcrlUX5JEDGM/LtmEENI/aVmYvHNif5zB+d9+ct

Warning: If you use the sample workflow file, check the global section of the file to set the AWS region, the type and the number of instances to use. See the configuration file page for the list of global parameters dedicated to AWS mode.

Then you can launch Eoulsan analysis with the following command:

$ eoulsan.sh emrexec -d "My Eoulsan Test" workflow-hadoop.xml design.txt s3://mybucket/test

Note: In the sample URL s3://mybucket/test, mybucket is the name of your S3 bucket and /test is the path of the data in S3 your bucket.

In AWS mode, there is 2 phases in the analysis:

  • First data to analyse, design and workflow files are uploaded to your S3 bucket on Amazon Cloud.
  • Then, the cluster in the cloud is created and the "real" analysis starts. Eoulsan on your local computer only wait the end of the analysis. You can quit Eoulsan or shutdown your computer, no data will be lost and analysis will continue on the cloud.

At the end of the analysis, the results files, the logs files (one main and one for each steps) and a copy of the design and the workflow files will be in a new directory in your S3 bucket named like eoulsan-20110310-101235 (the numbers in the directory name after eoulsan are respectively the date and the time of the launch of the analysis). You must manually download these data from S3 to your computer.

To retrieve your data you can use one of the following tools:

  • AWS S3 Console Web interface. Note that with this tool you can not download more than one file at the same time.
  • s3cmd: a command line tool. s3cmd is also available as Debian/Ubuntu package.
  • S3Fox Organizer: a Firefox extension that handle S3 upload and download.
  • Cyberduck: An open source FTP, SFTP, WebDAV, Cloud Files, Google Docs & Amazon S3 Browser for Mac & Windows.