metaTAXONx: metataxonomics pipeline for 16S marker-gene sequencing data

Introduction metaTAXONx

This pipeline is a best-practice suite for the pre-processing, denoising, classification and annotation of Illumina short-read sequencing data obtained by 16S rRNA marker-gene sequencing. The pipeline contains NF-core modules and other local modules that are in the similar format. It can be runned via both docker and singularity containers.

Pipeline summary

The pipeline is able to perform different taxonomic annotation on either (single/paired) reads. The different subworkflows can be defined via --bypass_ flags, a full overview is shown by running --help.

The pipeline performs preprocessing of the reads via the removal of primers or adapters via cutadapt and paired-end read merging via either FLASH or PEAR. Before and after each step the quality control will be assessed via fastqc and a multiqc report is created as output. The denoising of single-end reads is performed via DADA2 in batches or in paralell with the module run-dada2-batch.

Taxonomy assignment

The ASV (generated by DADA2 after denoising) are by default being classified with VSEARCH alignment against the SILVA-138 SSU database (Ref NR 99; i.e. non-redundant 99% identity), which is a 16S rRNA gene sequences reference database for taxonomic classification. This is part of the "classify-consensus-vsearch” QIIME2 feature classifier module. The data can be visualised as a comprehensive report via OmicFlow or as a human-readable format via BiotaViz.

[!NOTE] Classifiers need to be built with a sklearn version: 1.4.0

Specific pre-built classifiers can be assigned via --classifier_name, which are obtained from QIIME2 pre-built classifiers. A custom classifier can be supplied via the --classifier_custom flag, a thorough guide on how to create your own classifier can be found on this QIIME2 forum.

Diversity analysis

The pipeline uses QIIME2 for the construction of a phylogentic tree via fasttree and for diversity analysis, such as alpha diversity and beta diversity. Moreover, the pipeline uses rarefied data to ensure consistent sampling depth.

Installation

[!NOTE] Make sure you have installed the latest nextflow version!

Clone the repository in a directory of your choice:

git clone https://github.com/CMG-GUTS/metataxonx.git

The pipeline is containerised, meaning it can be runned via docker or singularity images. No further actions need to be performed when using the docker profile, except a docker registery needs to be set on your local system, see docker. In case singularity is used, images are automatically cached within the project directory.

Usage

Since the latest version, metaBIOMx works with both a samplesheet (CSV) format or a path to the input files. Preferably, samplesheets should be provided.

nextflow run main.nf --input  -work-dir work -profile singularity
nextflow run main.nf --input <'*_{1,R1,2,R2}.{fq,fq.gz,fastq,fastq.gz}'> -work-dir work -profile singularity

You can also try to run the test data set in tests/data folder, these are also available via the -profile test and only work with the 'standard' classifier name.

[!NOTE] Tests data should be runned with the flags --bypass_trim, which is default on true in the conf/test.config

📋 Sample Metadata File Specification

metaTAXONx expects your sample input data to follow a simple, but strict structure to ensure compatibility and allow upfront validation. The input should be provided as a CSV file where each entry = one sample with specified sequencing file paths. Additional properties not mentioned here will be ignored by the validation step.

Properties and Validation Rules

🔹 Required properties

Property	Type	Rules / Description
`sample_id`	string	Unique sample ID with no spaces (`^\S+$`). Serves as an identifier.
`forward_read`	string	File path to forward sequencing read. Must be non-empty string matching FASTQ gzipped pattern. File must exist.

🔹 Optional property

Property	Type	Rules / Description
`reverse_read`	string	File path to reverse sequencing read. Same constraints as `forward_read`. Required if specified.

🔹 Pattern‑based columns

You can define extra variables using special prefixes:

CONTRAST_... → grouping/category labels used in differential comparisons
Example: CONTRAST_Treatment with values Drug / Placebo These prefixes are used to generate an automated OmicFlow report with alpha, beta diversity and compositional plots. For more information see OmicFlow.

Support

If you are having issues, please create an issue

metaTAXONx: metataxonomics pipeline for 16S marker-gene sequencing data
v1.2.0

Introduction metaTAXONx

Pipeline summary

Taxonomy assignment

Diversity analysis

Installation

Usage

📋 Sample Metadata File Specification

Properties and Validation Rules

🔹 Required properties

🔹 Optional property

🔹 Pattern‑based columns

Support

Version History

v1.2.0 (earliest) Created 20th Jan 2026 at 14:39 by Alem Gusinac

Creators

Submitter

metaTAXONx: metataxonomics pipeline for 16S marker-gene sequencing data v1.2.0

Introduction metaTAXONx

Pipeline summary

Taxonomy assignment

Diversity analysis

Installation

Usage

📋 Sample Metadata File Specification

Properties and Validation Rules

🔹 Required properties

🔹 Optional property

🔹 Pattern‑based columns

Support

Version History

v1.2.0 (earliest) Created 20th Jan 2026 at 14:39 by Alem Gusinac

Creators

Submitter

Related items

metaTAXONx: metataxonomics pipeline for 16S marker-gene sequencing data
v1.2.0