1 - read pre-processing
Version 1

Workflow Type: Galaxy

Preprocessing of raw SARS-CoV-2 reads

The raw reads available so far are generated from bronchoalveolar lavage fluid (BALF) and are metagenomic in nature: they contain human reads, reads from potential bacterial co-infections as well as true COVID-19 reads.

Live Resources

usegalaxy.org usegalaxy.eu usegalaxy.org.au usegalaxy.be
Galaxy workflow Galaxy workflow Galaxy workflow Galaxy workflow
Galaxy history Galaxy history Galaxy history Galaxy history

What's the point?

Assess quality of reads, remove adapters and remove reads mapping to human genome.

The outline

Illumina and Oxford nanopore reads are pulled from the NCBI SRA (links to SRA accessions are available here). They are then processed separately as described in the workflow section.

Inputs

:boom: If you experience problems downloading data from NCBI SRA, use Galaxy history pre-populated with inputs as described in "Alternate Workflow" section below.

Only SRA accessions are required for this analysis. The described analysis was performed with all SRA SARS-CoV accessions available as of Feb 20, 2020:

  1. Illumina reads

    SRR10903401
    SRR10903402
    SRR10971381
    
  2. Oxford Nanopore reads

    SRR10948550
    SRR10948474
    SRR10902284
    

Outputs

This workflow produces three outputs that are used in two subsequent analyses:

# Output Used in
1. A combined set of adapter-free Illumina reads without human contamination Assembly
2. A combined set of Oxford Nanopore reads without human contamination Assembly
3. A collection of adapter-free Illumina reads from which human reads have not been removed Variation detection

The history and the workflow

A Galaxy workspace (history) containing the most current analysis can be imported from here.

The publicly accessible workflow can be downloaded and installed on any Galaxy instance. It contains version information for all tools used in this analysis.

The workflow performs the following steps:

Illumina

  • Illumina reads are QC'ed and adapter sequences are removed using fastp
  • Quality metrics are computed and visualized using fastqc and multiqc
  • Reads are mapped against human genome version hg38 using bwa mem
  • Reads that do not map to hg38 are filtered out using samtools view
  • Reads are converted back to fastq format using samtools fastx

Oxford nanopore

  • Reads are QC'ed using nanoplot
  • Quality metrics are computed and visualized using fastqc and multiqc
  • Reads are mapped against human genome version hg38 using minimap2
  • Reads that do not map to hg38 are filtered out using samtools view
  • Reads are converted back to fastq format using samtools fastx

BioConda

Tools used in this analysis are also available from BioConda:

Name Link
sra-tools Anaconda-Server Badge
fastqc Anaconda-Server Badge
multiqc Anaconda-Server Badge
fastp Anaconda-Server Badge
nanoplot Anaconda-Server Badge
bwa Anaconda-Server Badge
picard Anaconda-Server Badge
samtools Anaconda-Server Badge

Alternate Workflow

An alternate starting point has been created for those not wanting to wait for the data to be downloaded from the NCBI SRA. (This can especially be an issue in Australia or Europe.)

There is a shared history containing all of the starting data in appropriate collections and an alternate workflow able to make use of this alternate input. Apart from a slightly different starting point, the workflow and the outputs it produces are identical to that above.

usegalaxy.org usegalaxy.eu usegalaxy.org.au usegalaxy.be
Galaxy input history Galaxy input history Galaxy input history Galaxy input history
Galaxy alternate workflow Galaxy alternate workflow Galaxy alternate workflow Galaxy alternate workflow
Galaxy final history Galaxy final history Galaxy final history Galaxy history

Inputs

ID Name Description Type
List of Illumina accessions List of Illumina accessions n/a
  • File
List of ONT accessions List of ONT accessions n/a
  • File

Steps

ID Name Description
2 Illumina data toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/2.10.4
3 ONT data toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/2.10.4
4 fastp: Trimmed Illumina Reads toolshed.g2.bx.psu.edu/repos/iuc/fastp/fastp/0.19.3.3
5 NanoPlot toolshed.g2.bx.psu.edu/repos/iuc/nanoplot/nanoplot/1.28.2+galaxy1
6 FastQC toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.72
7 Map with minimap2 toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.12
8 MultiQC toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.7
9 Map with BWA-MEM toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.1
10 MultiQC toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.7
11 Filter SAM or BAM, output SAM or BAM toolshed.g2.bx.psu.edu/repos/devteam/samtool_filter2/samtool_filter2/1.8
12 Filter SAM or BAM, output SAM or BAM toolshed.g2.bx.psu.edu/repos/devteam/samtool_filter2/samtool_filter2/1.8
13 MergeSamFiles toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_MergeSamFiles/2.18.2.1
14 MergeSamFiles toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_MergeSamFiles/2.18.2.1
15 ONT filtered reads toolshed.g2.bx.psu.edu/repos/iuc/samtools_fastx/samtools_fastx/1.9+galaxy1
16 Illumina filtered reads toolshed.g2.bx.psu.edu/repos/iuc/samtools_fastx/samtools_fastx/1.9+galaxy1

Outputs

ID Name Description Type
_anonymous_output_3 _anonymous_output_3 n/a
  • File
_anonymous_output_4 _anonymous_output_4 n/a
  • File
_anonymous_output_5 _anonymous_output_5 n/a
  • File
_anonymous_output_6 _anonymous_output_6 n/a
  • File
_anonymous_output_7 _anonymous_output_7 n/a
  • File
_anonymous_output_8 _anonymous_output_8 n/a
  • File
_anonymous_output_9 _anonymous_output_9 n/a
  • File
_anonymous_output_10 _anonymous_output_10 n/a
  • File
_anonymous_output_11 _anonymous_output_11 n/a
  • File
_anonymous_output_12 _anonymous_output_12 n/a
  • File
_anonymous_output_13 _anonymous_output_13 n/a
  • File
_anonymous_output_14 _anonymous_output_14 n/a
  • File
_anonymous_output_15 _anonymous_output_15 n/a
  • File
_anonymous_output_16 _anonymous_output_16 n/a
  • File
_anonymous_output_17 _anonymous_output_17 n/a
  • File
_anonymous_output_18 _anonymous_output_18 n/a
  • File
_anonymous_output_19 _anonymous_output_19 n/a
  • File
_anonymous_output_20 _anonymous_output_20 n/a
  • File
_anonymous_output_21 _anonymous_output_21 n/a
  • File
_anonymous_output_22 _anonymous_output_22 n/a
  • File
_anonymous_output_23 _anonymous_output_23 n/a
  • File
_anonymous_output_24 _anonymous_output_24 n/a
  • File
_anonymous_output_25 _anonymous_output_25 n/a
  • File
_anonymous_output_26 _anonymous_output_26 n/a
  • File
_anonymous_output_27 _anonymous_output_27 n/a
  • File
_anonymous_output_28 _anonymous_output_28 n/a
  • File
_anonymous_output_29 _anonymous_output_29 n/a
  • File
_anonymous_output_30 _anonymous_output_30 n/a
  • File
_anonymous_output_31 _anonymous_output_31 n/a
  • File
_anonymous_output_32 _anonymous_output_32 n/a
  • File
_anonymous_output_33 _anonymous_output_33 n/a
  • File

Version History

Version 1 (earliest) Created 17th Sep 2020 at 10:38 by Finn Bacall

Added/updated 6 files


Open master 2276666
help Creators and Submitter
Creators
Not specified
Submitter
Activity

Views: 2108   Downloads: 373

Created: 17th Sep 2020 at 10:38

help Tags
help Attributions

None

Total size: 585 KB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH