Workflows
What is a Workflow?Filters
CLAWS (CNAG's Long-read Assembly Workflow in Snakemake)
Snakemake Pipeline used for de novo genome assembly @CNAG. It has been developed for Snakemake v6.0.5.
It accepts Oxford Nanopore Technologies (ONT) reads, PacBio HFi reads, illumina paired-end data, illumina 10X data and Hi-C reads. It does the preprocessing of the reads, assembly, polishing, purge_dups, scaffolding, different evaluation steps and generation of pretext files for curation. Default behavior is to preprocess the reads, ...
Type: Snakemake
Creators: Jessica Gomez-Garrido, Francisco Camara Ferreira, Fernando Cruz, Tyler Alioto
Submitter: Jessica Gomez-Garrido
The workflow takes a trimmed long reads collection, and Forward/Reverse HiC reads to run Hifiasm in HiC phasing mode. It produces both Pri/Alt and Hap1/Hap2 assemblies, and runs all the QC analysis (gfastats, BUSCO, and Merqury). The default Hifiasm purge level is aggressive (l3).
The workflow takes a (trimmed) Long reads collection, runs Meryl to create a K-mer database, Genomescope2 to estimate genome properties and Smudgeplot to estimate ploidy (optional). The main results are K-mer database and genome profiling plots, tables, and values useful for downstream analysis. Default K-mer length and ploidy for Genomescope are 31 and 2, respectively.
The workflow takes a HiFi reads collection, runs FastQC and SeqKit, filters with Cutadapt, and creates a MultiQC report. The main outputs are a collection of filtred reads, a report with raw and filtered reads stats, and a table with raw reads stats.
The workflow takes a paired-reads collection (like illumina WGS or HiC), runs FastQC and SeqKit, trims with Fastp, and creates a MultiQC report. The main outputs are a paired collection of trimmed reads, a report with raw and trimmed reads stats, and a table with raw reads stats.
The workflow takes ONT reads collection, runs SeqKit and Nanoplot. The main outputs are a table and plots of raw reads stats.
The workflow takes a Long Reads collection, Pri/Alt contigs, and the values for transition parameter and max coverage depth (calculated from WF1) to run Purge_Dups. It produces purged Pri and Alt contigs assemblies, and runs all the QC analysis (gfastats, BUSCO, and Merqury).
The workflow takes trimmed HiC paired-end reads collection, and Pri/Alt assemblies to produce a scaffolded primary assembly (and alternate contigs) using YaHS. It also runs Pretext and all the QC analyses (gfastats, BUSCO, and Merqury).
The workflow takes a long reads collection (HiFi, or ONT also possible now), and max coverage depth (calculated from WF1) to run Hifiasm in solo mode. It produces a Pri/Alt assembly, Bandage plots, and runs all the QC analysis (gfastats, BUSCO, and Merqury).
Assembly Evaluation for ERGA-BGE Reports
One Assembly, Illumina WGS reads + HiC reads
The workflow requires the following:
- Species Taxonomy ID number
- NCBI Genome assembly accession code
- BUSCO Lineage
- WGS accurate reads accession code
- NCBI HiC reads accession code
The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and ...