How to create a workflow

RattleSNP allows you to build a workflow using a simple config.yaml configuration file :

  • First, provide the data paths

  • Second, activate tools from mapping to SNP calling.

  • And last, manage parameters tools.

To create this file, just run:

create_config

Create config.yaml for run

rattleSNP create_config [OPTIONS]

Options

-c, --configyaml <configyaml>

Required Path to create config.yaml

Then, edit the relevant sections of the file to customize your flavor of a workflow.

1. Providing data

First, indicate the data path in the config.yaml configuration file:

DATA:
    FASTQ: "/path/to/fastq/"
    VCF: ""
    REFERENCE_FILE: "/path/to/reference.fasta"
    OUTPUT: "/path/to/output"

Find here a summary table with description of each data need to launch RattleSNP :

Input

Description

FASTQ

Every paired FASTQ file should contain the whole set of reads to be mapped. Each fastq file will be mapped independently.

VCF

If SNP calling already run, you can use directly vcf to filter

REFERENCE_FILE

Only one REFERENCE genome file will be used by RattleSNP. This REFERENCE will be used for Mapping step

OUTPUT

output path directory

Warning

For FASTQ, naming convention accepted by RattleSNP is NAME_R1.fastq.gz or NAME_R1.fq.gz or NAME_R1.fastq or NAME_R1.fq. Preferentially use short names and avoid special characters because report can fail. Avoid to use the long name given directly by sequencer. Same for _R2 All fastq files have to be homogeneous on their extension and can be compressed or not.

Reference fasta file need a .fasta or .fa extension uncompressed.

2. Providing params

PARAMS:
    MITOCHONDRIAL_NAME : ""
    # The filter suffix to add on vcf filter in order to allow multiple filter
    FILTER_SUFFIX: ["-Q30-DP5-MAF005-MISS07",
                    "-Q30-DP20-MAF001-MISS05"]

Find here a summary table with description of each params for RattleSNP :

Params

Description

MITOCHONDRIAL_NAME

The name of mitochondrial sequence on fasta, used to remove on VCF file. If not keep empty

FILTER_SUFFIX

The suffix name add to vcf filters file

3. Provide workflow step

Activate/deactivate tools as you wish. Feel free to activate only assembly, assembly+polishing or assembly+polishing+correction.

Example:

################################
# Pipeline tools activation
FASTQC: true
CLEANING:
    ATROPOS: true
MAPPING:
    ACTIVATE: true
    TOOL: "BWA_MEM"         # Use BWA_MEM or BWA_SAMPE only
    BUILD_STATS: true      # warning if true but mapping false, mapping automatically run
SNPCALLING: true
FILTER: true                # Must be true if want run raxml or raxml-ng
RAXML: true

4. Parameters for some specific tools

You can manage tools parameters on the params section in the config.yaml file.

Here you find standard parameters used on RattleSNP. Feel free to adapt it to your requires.

################################
# Misc. options for programs
PARAMS_TOOLS:
    ATROPOS: "--minimum-length 35  -q 20,20  -U 8  -O 10"
    FASTQC: ""
    BWA_ALN: ""
    BWA_SAMPE: ""
    BWA_MEM: ""
    SAMTOOLS_VIEW: "-bh -f 2"
    SAMTOOLS_SORT: ""
    SAMTOOLS_DEPTH: ""
    PICARDTOOLS_MARK_DUPLICATES: "-CREATE_INDEX TRUE -VALIDATION_STRINGENCY SILENT"
    GATK_HAPLOTYPECALLER: "--java-options '-Xmx40G' --emit-ref-confidence GVCF --output-mode EMIT_ALL_ACTIVE_SITES -ploidy 1"
    GATK_GENOMICSDBIMPORT: "--java-options '-Xmx40G' "
    GATK_GENOTYPEGVCFS: "--java-options '-Xmx40G' -new-qual"
    VCFTOOLS: ["--minDP 5 --minQ 30 --remove-indels --recode --recode-INFO-all --maf 0.05 --max-missing 0.7",
               "--minDP 20 --minQ 30 --remove-indels --recode --recode-INFO-all --maf 0.01 --max-missing 0.5"]
    RAXML: "-m GTRGAMMAX -f a -x $RANDOM -# autoMRE -p 600"
    RAXML_NG: "--all --model GTR+G --tree pars{50},rand{50} --bs-trees 100 --seed $RANDOM"

Warning

Please check documentation of each tool (outside of RattleSNP, and make sure that the settings are correct!)


How to run the workflow

Before attempting to run rattleSNP, please verify that you have already modified the config.yaml file as explained in 1. Providing data.

If you installed RattleSNP on a HPC cluster with a job scheduler, you can run:

run_cluster

Run snakemake command line with mandatory parameters.
SNAKEMAKE_OTHER: You can also pass additional Snakemake parameters using snakemake syntax.
These parameters will take precedence over Snakemake ones, which were defined in the profile.
Example:
rattleSNP run_cluster -c config.yaml –dry-run –jobs 200
rattleSNP run_cluster [OPTIONS] [SNAKEMAKE_OTHER]...

Options

-c, --config <config>

Required Configuration file for run tool

-pdf, --pdf

Run snakemake with –dag, –rulegraph and –filegraph

Default:

False

Arguments

SNAKEMAKE_OTHER

Optional argument(s)


run_local

Run snakemake command line with mandatory parameters.
SNAKEMAKE_OTHER: You can also pass additional Snakemake parameters using snakemake syntax.
These parameters will take precedence over Snakemake ones, which were defined in the profile.
Example:
rattleSNP run_local -c config.yaml –threads 8 –dry-run
rattleSNP run_local -c config.yaml –threads 8 –singularity-args ‘–bind /mnt:/mnt’
rattleSNP run_local [OPTIONS] [SNAKEMAKE_OTHER]...

Options

-c, --config <config>

Required Configuration file for run tool

-t, --threads <threads>

Required Number of threads

-p, --pdf

Run snakemake with –dag, –rulegraph and –filegraph

Arguments

SNAKEMAKE_OTHER

Optional argument(s)


Advance run

Providing more resources

If the cluster default resources are not sufficient, you can edit the cluster_config.yaml file. See 2. Adapting cluster_config.yaml:

edit_cluster_config

Edit cluster_config.yaml use by profile

rattleSNP edit_cluster_config [OPTIONS]

Providing your own tools_config.yaml

To change the tools used in a RattleSNP workflow, you can see 3. How to configure tools_path.yaml

edit_tools

Edit own tools version

rattleSNP edit_tools [OPTIONS]

Options

-r, --restore

Restore default tools_config.yaml (from install)

Default:

False


Output on RattleSNP

The architecture of RattleSNP output is designed as follow:

OUTPUT_RattleSNP/
├── 1_mapping
├── 2_snp_calling
├── 3_full_snp_calling_stats
├── 4_raxml
├── LOGS

Report

RattleSNP generates a useful report containing, foreach fastq, a summary of interesting statistics !!