metaTAXONx: metataxonomics pipeline for 16S marker-gene sequencing data
v1.2.0

Workflow Type: Nextflow
Stable

Nextflow run with docker run with singularity nf-test

Introduction metaTAXONx

This pipeline is a best-practice suite for the pre-processing, denoising, classification and annotation of Illumina short-read sequencing data obtained by 16S rRNA marker-gene sequencing. The pipeline contains NF-core modules and other local modules that are in the similar format. It can be runned via both docker and singularity containers.

Pipeline summary

The pipeline is able to perform different taxonomic annotation on either (single/paired) reads. The different subworkflows can be defined via --bypass_ flags, a full overview is shown by running --help.

The pipeline performs preprocessing of the reads via the removal of primers or adapters via cutadapt and paired-end read merging via either FLASH or PEAR. Before and after each step the quality control will be assessed via fastqc and a multiqc report is created as output. The denoising of single-end reads is performed via DADA2 in batches or in paralell with the module run-dada2-batch.

Taxonomy assignment

The ASV (generated by DADA2 after denoising) are by default being classified with VSEARCH alignment against the SILVA-138 SSU database (Ref NR 99; i.e. non-redundant 99% identity), which is a 16S rRNA gene sequences reference database for taxonomic classification. This is part of the "classify-consensus-vsearch” QIIME2 feature classifier module. The data can be visualised as a comprehensive report via OmicFlow or as a human-readable format via BiotaViz.

[!NOTE] Classifiers need to be built with a sklearn version: 1.4.0

Specific pre-built classifiers can be assigned via --classifier_name, which are obtained from QIIME2 pre-built classifiers. A custom classifier can be supplied via the --classifier_custom flag, a thorough guide on how to create your own classifier can be found on this QIIME2 forum.

Diversity analysis

The pipeline uses QIIME2 for the construction of a phylogentic tree via fasttree and for diversity analysis, such as alpha diversity and beta diversity. Moreover, the pipeline uses rarefied data to ensure consistent sampling depth.

Installation

[!NOTE] Make sure you have installed the latest nextflow version!

Clone the repository in a directory of your choice:

git clone https://github.com/CMG-GUTS/metataxonx.git

The pipeline is containerised, meaning it can be runned via docker or singularity images. No further actions need to be performed when using the docker profile, except a docker registery needs to be set on your local system, see docker. In case singularity is used, images are automatically cached within the project directory.

Usage

Since the latest version, metaBIOMx works with both a samplesheet (CSV) format or a path to the input files. Preferably, samplesheets should be provided.

nextflow run main.nf --input  -work-dir work -profile singularity
nextflow run main.nf --input <'*_{1,R1,2,R2}.{fq,fq.gz,fastq,fastq.gz}'> -work-dir work -profile singularity

You can also try to run the test data set in tests/data folder, these are also available via the -profile test and only work with the 'standard' classifier name.

[!NOTE] Tests data should be runned with the flags --bypass_trim, which is default on true in the conf/test.config

📋 Sample Metadata File Specification

metaTAXONx expects your sample input data to follow a simple, but strict structure to ensure compatibility and allow upfront validation. The input should be provided as a CSV file where each entry = one sample with specified sequencing file paths. Additional properties not mentioned here will be ignored by the validation step.

Properties and Validation Rules

🔹 Required properties

Property Type Rules / Description
sample_id string Unique sample ID with no spaces (^\S+$). Serves as an identifier.
forward_read string File path to forward sequencing read. Must be non-empty string matching FASTQ gzipped pattern. File must exist.

🔹 Optional property

Property Type Rules / Description
reverse_read string File path to reverse sequencing read. Same constraints as forward_read. Required if specified.

🔹 Pattern‑based columns

You can define extra variables using special prefixes:

  • CONTRAST_... → grouping/category labels used in differential comparisons
    Example: CONTRAST_Treatment with values Drug / Placebo These prefixes are used to generate an automated OmicFlow report with alpha, beta diversity and compositional plots. For more information see OmicFlow.

Support

If you are having issues, please create an issue

Version History

v1.2.0 (earliest) Created 20th Jan 2026 at 14:39 by Alem Gusinac

added low memory process demands for CI


Frozen v1.2.0 1c87f3b
help Creators and Submitter
Creators
Submitter
Citation
Gusinac, A., Ederveen, T., Boekhorst, J., & Boleij, A. (2026). metaTAXONx: metataxonomics pipeline for 16S marker-gene sequencing data. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.2054.1
Activity

Views: 697   Downloads: 132

Created: 20th Jan 2026 at 14:39

Last updated: 23rd Jan 2026 at 14:11

Annotated Properties
Topic annotations
help Attributions

None

Total size: 2.78 MB
Powered by
(v.1.17.3)
Copyright © 2008 - 2026 The University of Manchester and HITS gGmbH