Workflows
What is a Workflow?Filters
Racon polish with long reads, x4
Downloads fastq files for sequencing run accessions provided in a text file using fasterq-dump. Creates one job per listed run accession.
This workflow takes as input SR BAM from ChIP-seq. It calls peaks on each replicate and intersect them. In parallel, each BAM is subsetted to smallest number of reads. Peaks are called using both subsets combined. Only peaks called using a combination of both subsets which have summits intersecting the intersection of both replicates will be kept.
Virtual screening of the SARS-CoV-2 main protease with rDock and pose scoring
This workflow take as input a collection of paired fastq. Remove adapters with cutadapt, map pairs with bowtie2 allowing dovetail. Keep MAPQ30 and concordant pairs. BAM to BED. MACS2 with "ATAC" parameters.
This workflow take as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file. The pairs are filtered for MAPQ and sorted by cooler to generate a tabix dataset. Cooler is used to generate a balanced cool file to the desired resolution.
This workflow take as input a collection of paired fastq. It will remove bad quality and adapters with cutadapt. Map with Bowtie2 end-to-end. Will remove reads on MT and unconcordant pairs and pairs with mapping quality below 30 and PCR duplicates. Will compute the pile-up on 5' +- 100bp. Will call peaks and count the number of reads falling in the 1kb region centered on the summit. Will plot the number of reads for each fragment length.
This workflow takes as input a list of single-read fastqs. Adapters and bad quality bases are removed with cutadapt. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously. The counts are reprocess to be similar to HTSeq-count output. FPKM are computed with cufflinks. Coverage (per million mapped reads) are computed with bedtools on uniquely mapped reads.
This workflow takes as input a list of paired-end fastqs. Adapters and bad quality bases are removed with cutadapt. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously. The counts are reprocess to be similar to HTSeq-count output. FPKM are computed with cufflinks. Coverage (per million mapped reads) are computed with bedtools on uniquely mapped reads (with R2 orientation inverted).
VGP Workflow #1
This workflow produces a Meryl database and Genomescope outputs that will be used to determine parameters for following workflows, and assess the quality of genome assemblies. Specifically, it provides information about the genomic complexity, such as the genome size and levels of heterozygosity and repeat content, as well about the data quality.
Inputs
- Collection of Hifi long reads in FASTQ format
Outputs
- Meryl Database of kmer counts
- GenomeScope
- Linear plot
...
Tests