Design file v2

The design file is the main element of the pipeline. It contains all informations and descriptions of the experiments and the samples. Two versions of the design file are available in Eoulsan, here the v2 is presented. Usually the design file is named design.txt, but you can specify another name. Here is an example of design file:

[Header]
DesignFormatVersion=2
Project=finalDesignExample
ProjectDescription=Example of the final version of the new design for Eoulsan
Owner=Xavier Bauquet

GenomeFile=genome://mm10
GffFile=gff://mm10
AdditionalAnnotationFile=additionalannotation://mm10_ens75_transcript

[Experiments]
Exp.1.name=exp1
Exp.1.skip=False
Exp.1.reference=WT-day1

Exp.2.name=exp2
Exp.2.skip=false
Exp.2.contrast=true
Exp.2.buildContrast=true
Exp.2.model=~type+day+type:day
Exp.2.comparisons=WT1_vs_KO1:typeWT%dayday1_vs_typeKO%dayday1;\
WT2_vs_KO2:typeWT%dayday2_vs_typeKO%dayday2

Exp.3.name=exp3
Exp.3.skip=false

[Columns]
SampleId	SampleName	Reads		Condition	Reference	Exp.1.RepTechGroup	Exp.2.type		Exp.2.day	Exp.2.RepTechGroup	Exp.3.Condition	Exp.3.RepTechGroup
s1a		Sample1a	sample1a.fastq	WT-day1		1		WT-day1			WT			day1		WT-day1			WT		WT-day1
s1b		Sample1b	sample1a.fastq	WT-day1		0		WT-day1			WT			day1		WT-day1			WT		WT-day1
s2a		Sample2a	sample2a.fastq	WT-day2		2		WT-day2			WT			day2		WT-day2			WT		WT-day2
s2b		Sample2b	sample2b.fastq	WT-day2		0		WT-day2			WT			day2		WT-day2			WT		WT-day2
s3a		Sample3a	sample3a.fastq	KO-day1		3		KO-day1			KO			day1		KO-day1			KO		KO-day1
s3b		Sample3b	sample3b.fastq	KO-day1		0		KO-day1			KO			day1		KO-day1			KO		KO-day1
s4a		Sample4a	sample4a.fastq	KO-day2		0		KO-day2			KO			day2		KO-day2			KO		KO-day2
s4b		Sample4b	sample4b.fastq	KO-day2		0		KO-day2			KO			day2		KO-day2			KO		KO-day2

This version of the design file include 3 sections:

  • Header
  • Experiments
  • Columns

The Header and Experiments sections use key, value entries separated by "=". The Columns section is tabulated. This section is comparable to the first version of the design file.

You can always use "\" to write your Header and Experiments options on several lines:

Exp.2.comparisons=WT1_vs_KO1:typeWT%dayday1_vs_typeKO%dayday1;\
WT2_vs_KO2:typeWT%dayday2_vs_typeKO%dayday2

Important informations

  • Column overload: The experiment column have always priority on the common column. For example on the design above the column Condition of the Exp.3 overload the common column Condition.
  • Column for contrast with DESeq2: The columns type and day of the Exp.2 are column made for DESeq2. The values of these column cannot begin by a number, a letter is required at the beginning of each value.
  • Reference column: This column accepts only integers:

    • -1: ignore the sample for the differential expression analysis, but this sample is still used to perform the normalization.
    • 0: sample used during the differential expression analysis but not as a reference.
    • >0: the samples are used as reference by priority according the value of the integer. For example on the design above WT-day1 is the first reference, WT-day2 is the second ... The comparisons performed by the differential expression analysis will be the following ones:

      • WT-day2 vs WT-day1
      • KO-day1 vs WT-day1
      • KO-day2 vs WT-day1

      • KO-day1 vs WT-day2
      • KO-day2 vs WT-day2

      • KO-day2 vs KO-day1

Header

This section contains general informations about the design and the project. Optional key:

  • GenomeFile: The genome file to be used for the mapping.
  • GffFile: The GFF3 file including the annotations for the count step.
  • GtfFile: The GTF file including the annotations for the count step.
  • AdditionalAnnotationFile: The file including additional information and annotations about the features.
You can add any other keys. Here are some examples:
  • DesignFormatVersion
  • Project
  • ProjectDescription
  • Owner

Experiments

This section allow to specify several differential expression analysis, called experiments. This represents the main enhancement bring by this design. An experiment is defined by the following pattern: Exp."id"."key", with id = the experiment id (integer >0) and key = the key of the option.

Exp.2.name=exp2
Exp.2.skip=false
Exp.2.contrast=true
Exp.2.buildContrast=true
Exp.2.model=~type+day+type:day
Exp.2.comparisons=WT1_vs_KO1:typeWT%dayday1_vs_typeKO%dayday1;\
WT2_vs_KO2:typeWT%dayday2_vs_typeKO%dayday2
One keys is mandatory:
  • name: The name of the experiment.
Optional key:
  • skip: true/false. True to skip the experiment.
  • reference: To define the reference condition for the differential analysis.
You can add any other keys. To use DEseq2 for the differential expression analysis you need to add some DESeq2 options as keys:
  • contrast: true/false. true to use DEseq2 in contrast mode.
  • model: The DEseq2 model. example: "~Condition".
  • buildContrast: true/false. true if DEseq2 need to create the contrast matrix.
Particular case:
  • comparisons: This key corresponds to the comparisons to do with DEseq2 when used in contrast mode. You have to use the following pattern: comparisonName:comparisonFormula for each comparison. The comparison are separated by ";".
Comparison formula:

Exp.2.comparisons=WT1_vs_KO1:typeWT%dayday1_vs_typeKO%dayday1;\
WT2_vs_KO2:typeWT%dayday2_vs_typeKO%dayday2
The comparison formula is constructed with the name of the column on the design file (example: type) followed by the value (example: WT). Each "columnvalue"should be separated by the "%" symbol to notify an association (example: typeWT%dayday1) and separated by the "_vs_" symbol to notify a comparison (example: typeWT%dayday1_vs_typeKO%dayday1).

Columns

This section correspond to the information about the samples. Mandatory columns:

  • SampleId: Numeric, must be unique and >0.
  • SampleName: Name of the sample. Must be unique.
You can add any other keys. Here are some examples:
  • Reads: Path to the reads data file (Mandatory for mapping step).
  • Condition: This column include the biological replicates, several sample that are biological replicates need the same "Condition" value (Mandatory for differential expression analysis step).
  • RepTechGroup: Technical replicates group. Use to pool reads counts between technical replicates in differential analysis step (Mandatory for differential expression analysis step).
  • FastqFormat: Fastq format. Currently Eoulsan handle 4 formats: fastq-sanger (default), fastq-solexa, fastq-illumina and fastq-illumina-1.5. See Cock et al for more information about the various fastq formats.
  • Reference: The value from the "Condition" column for the sample to be used as reference for the differential expression analysis (Mandatory for differential expression analysis step).
  • UUID: Universal Unique Identifier of the sample. This field is generated by Eouslan createdesign command. In obfuscated design files, this value does not change.
To create columns specific to an experiment, you should add Exp."id". before the column name.This allow to create several columns with the same name but with different values for each experiment.