Workflow Type: Nextflow
Stable

CI-stub-run Nextflow run with docker run with singularity nf-test

Introduction: metaBIOMx

The metagenomics microbiomics pipeline is a best-practice suite for the decontamination and annotation of sequencing data obtained via short-read shotgun sequencing. The pipeline contains NF-core modules and other local modules that are in the similar format. It can be runned via both docker and singularity containers.

Pipeline summary

The pipeline is able to perform different taxonomic annotation on either (single/paired) reads or contigs. The different subworkflows can be defined via --bypass_ flags, a full overview is shown by running --help. By default the pipeline will check if the right databases are present in the right formats, when the path is provided. If this is not the case, compatible databases will be automatically downloaded.

For both subworkflows the pipeline will perform read trimming via Trimmomatic and/or AdapterRemoval, followed by human removal via Kneaddata. Before and after each step the quality control will be assessed via fastqc and a multiqc report is created as output. Then taxonomy annotation is done as follows:

Read annotation

  • paired reads are interleaved using BBTools.
  • MetaPhlAn3 and HUMAnN3 are used for taxonomy and functional profiling.
  • taxonomy profiles are merged into a single BIOM file using biom-format.

Contig annotation

  • read assembly is performed via SPAdes.
  • Quality assesment of contigs is done via Busco.
  • taxonomy profiles are created using CAT.
  • Read abundance estimation is performed on the contigs using Bowtie2 and BCFtools.
  • Contigs are selected if a read can be aligned against a contig and a BIOM file is generated using biom-format.

Installation

[!NOTE] Make sure you have installed the latest nextflow version!

Clone the repository in a directory of your choice:

git clone https://github.com/CMG-GUTS/metabiomx.git

The pipeline is containerised, meaning it can be runned via docker or singularity images. No further actions need to be performed when using the docker profile, except a docker registery needs to be set on your local system, see docker. In case singularity is used, images are automatically cached within the project directory.

Usage

Since the latest version, metaBIOMx works with both a samplesheet (CSV) format or a path to the input files. Preferably, samplesheets should be provided.

nextflow run main.nf --input  -work-dir work -profile singularity
nextflow run main.nf --input <'*_{1,R1,2,R2}.{fq,fq.gz,fastq,fastq.gz}'> -work-dir work -profile singularity

You can also try to run the test data set in tests/data folder, these are also available via the -profile test but then the databases will be by default downloaded in the tests/db folder and you might want to specify a folder of your own choice.

[!NOTE] Tests data should be runned with the flags --bypass_trim and --bypass_decon, which is default on true in the conf/test.config

📋 Sample Metadata File Specification

metaBIOMx expects your sample input data to follow a simple, but strict structure to ensure compatibility and allow upfront validation. The input should be provided as a CSV file where each entry = one sample with specified sequencing file paths. Additional properties not mentioned here will be ignored by the validation step.

Properties and Validation Rules

🔹 Required properties

Property Type Rules / Description
sample_id string Unique sample ID with no spaces (^\S+$). Serves as an identifier.
forward_read string File path to forward sequencing read. Must be non-empty string matching FASTQ gzipped pattern. File must exist.

🔹 Optional property

Property Type Rules / Description
reverse_read string File path to reverse sequencing read. Same constraints as forward_read. Required if specified.

🔹 Pattern‑based columns

You can define extra variables using special prefixes:

  • CONTRAST_... → grouping/category labels used in differential comparisons
    Example: CONTRAST_Treatment with values Drug / Placebo These prefixes are used to generate an automated OmicFlow report with alpha, beta diversity and compositional plots. For more information see OmicFlow.

Example cases

🔹 Read annotation

nextflow run main.nf \
    --input  \
    # (optional) --bypass_trim \
    # (optional) --bypass_decon \
    --bypass_contig_annotation \
    -work-dir work \
    -profile singularity

🔹 Contig annotation

nextflow run main.nf \
    --input  \
    # (optional) --bypass_trim \
    # (optional) --bypass_decon \
    --bypass_read_annotation \
    -work-dir work \
    -profile singularity

In case you only have assemblies and wish to perform contig annotation:

nextflow run main.nf \
    --input  \
    --bypass_assembly \
    --bypass_read_annotation \
    -work-dir work \
    -profile singularity

Automatic database setup

The pipeline requires a set of databases which are used by the different tools within this workflow. The user is required to specify the location in where the databases will be downloaded. It is also possible to download the databases manually. The configure subworkflow will evaluate the database format and presence of the compatible files automatically.

nextflow run main.nf \
    --bowtie_db path/to/db/bowtie2 \
    --metaphlan_db path/to/db/metaphlan \
    --humann_db path/to/db/humann \
    --catpack_db path/to/db/catpack \
    --busco_db path/to/db/busco_downloads \
    -work-dir  \
    -profile 

Support

If you are having issues, please create an issue

Version History

v1.2.0 (latest) Created 21st Jan 2026 at 11:22 by Alem Gusinac

updated changelog


Frozen v1.2.0 8be75a3

v1.1.1 Created 2nd Dec 2025 at 13:27 by Alem Gusinac

Made nextflow.config more default. Fixed docker issue with OmicFlow


Frozen v1.1.1 daf71b2

v1.1.0 Created 7th Oct 2025 at 09:54 by Alem Gusinac

updated '--help'


Frozen v1.1.0 5943455

main @ f33ea66 Created 7th Jul 2025 at 15:17 by Alem Gusinac

This is the latest version that passed the nf-test.

NOTE: I just noticed that the nextflow.config contains a typo, 'inpu' should be 'input'! I am so sorry for this ~ the github is up to date.


Frozen main f33ea66

v1.0.0-alpha Created 7th Jul 2025 at 14:00 by Alem Gusinac

Please do not use this version, somehow the release did not sync with the latest updates.


Frozen v1.0.0-alpha 00f3c9c
help Creators and Submitter
Creators
  • Alem Gusinac
  • Thomas Ederveen
  • Jos Boekhorst
  • Annemarie Boleij
Submitter
Citation
Gusinac, A., Ederveen, T., Boekhorst, J., & Boleij, A. (2026). metaBIOMx: Metagenomics pipeline for Microbial shot-gun sequencing data. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1787.8
Activity

Views: 3602   Downloads: 899

Created: 3rd Jul 2025 at 15:46

Last updated: 21st Jan 2026 at 11:22

help Attributions

None

Total size: 2.81 MB
Powered by
(v.1.17.3)
Copyright © 2008 - 2026 The University of Manchester and HITS gGmbH