Introduction metaTAXONx
This pipeline is a best-practice suite for the pre-processing, denoising, classification and annotation of Illumina short-read sequencing data obtained by 16S rRNA marker-gene sequencing. The pipeline contains NF-core modules and other local modules that are in the similar format. It can be runned via both docker and singularity containers.
Pipeline summary
The pipeline is able to perform different taxonomic annotation on either (single/paired) reads. The different subworkflows can be defined via --bypass_ flags, a full overview is shown by running --help.
The pipeline performs preprocessing of the reads via the removal of primers or adapters via cutadapt and paired-end read merging via either FLASH or PEAR. Before and after each step the quality control will be assessed via fastqc and a multiqc report is created as output. The denoising of single-end reads is performed via DADA2 in batches or in paralell with the module run-dada2-batch.
Taxonomy assignment
The ASV (generated by DADA2 after denoising) are by default being classified with VSEARCH alignment against the SILVA-138 SSU database (Ref NR 99; i.e. non-redundant 99% identity), which is a 16S rRNA gene sequences reference database for taxonomic classification. This is part of the "classify-consensus-vsearch” QIIME2 feature classifier module. The data can be visualised as a comprehensive report via OmicFlow or as a human-readable format via BiotaViz.
[!NOTE] Classifiers need to be built with a sklearn version: 1.4.0
Specific pre-built classifiers can be assigned via --classifier_name, which are obtained from QIIME2 pre-built classifiers. A custom classifier can be supplied via the --classifier_custom flag, a thorough guide on how to create your own classifier can be found on this QIIME2 forum.
Diversity analysis
The pipeline uses QIIME2 for the construction of a phylogentic tree via fasttree and for diversity analysis, such as alpha diversity and beta diversity. Moreover, the pipeline uses rarefied data to ensure consistent sampling depth.
Installation
[!NOTE] Make sure you have installed the latest nextflow version!
Clone the repository in a directory of your choice:
git clone https://github.com/CMG-GUTS/metataxonx.git
The pipeline is containerised, meaning it can be runned via docker or singularity images. No further actions need to be performed when using the docker profile, except a docker registery needs to be set on your local system, see docker. In case singularity is used, images are automatically cached within the project directory.
Usage
Since the latest version, metaBIOMx works with both a samplesheet (CSV) format or a path to the input files. Preferably, samplesheets should be provided.
nextflow run main.nf --input -work-dir work -profile singularity
nextflow run main.nf --input <'*_{1,R1,2,R2}.{fq,fq.gz,fastq,fastq.gz}'> -work-dir work -profile singularity
You can also try to run the test data set in tests/data folder, these are also available via the -profile test and only work with the 'standard' classifier name.
[!NOTE] Tests data should be runned with the flags
--bypass_trim, which is default ontruein theconf/test.config
📋 Sample Metadata File Specification
metaTAXONx expects your sample input data to follow a simple, but strict structure to ensure compatibility and allow upfront validation. The input should be provided as a CSV file where each entry = one sample with specified sequencing file paths. Additional properties not mentioned here will be ignored by the validation step.
Properties and Validation Rules
🔹 Required properties
| Property | Type | Rules / Description |
|---|---|---|
sample_id |
string | Unique sample ID with no spaces (^\S+$). Serves as an identifier. |
forward_read |
string | File path to forward sequencing read. Must be non-empty string matching FASTQ gzipped pattern. File must exist. |
🔹 Optional property
| Property | Type | Rules / Description |
|---|---|---|
reverse_read |
string | File path to reverse sequencing read. Same constraints as forward_read. Required if specified. |
🔹 Pattern‑based columns
You can define extra variables using special prefixes:
CONTRAST_...→ grouping/category labels used in differential comparisons
Example:CONTRAST_Treatmentwith valuesDrug/PlaceboThese prefixes are used to generate an automatedOmicFlowreport with alpha, beta diversity and compositional plots. For more information see OmicFlow.
Support
If you are having issues, please create an issue
Version History
v1.2.0 (earliest) Created 20th Jan 2026 at 14:39 by Alem Gusinac
added low memory process demands for CI
Frozen
v1.2.0
1c87f3b
Creators and SubmitterCreators
Submitter
Views: 697 Downloads: 132
Created: 20th Jan 2026 at 14:39
Last updated: 23rd Jan 2026 at 14:11
AttributionsNone
View on GitHub
https://orcid.org/0000-0003-0068-1275