Publications

What is a Publication?
66 Publications visible to you, out of a total of 66

Abstract (Expand)

A key limiting factor in organising and using information from physical specimens curated in natural science collections is making that information computable, with institutional digitization tending to focus more on imaging the specimens themselves than on efficiently capturing computable data about them. Label data are traditionally manually transcribed today with high cost and low throughput, rendering such a task constrained for many collection-holding institutions at current funding levels. We show how computer vision, optical character recognition, handwriting recognition, named entity recognition and language translation technologies can be implemented into canonical workflow component libraries with findable, accessible, interoperable, and reusable (FAIR) characteristics. These libraries are being developed in a cloud- based workflow platform—the ‘Specimen Data Refinery’ (SDR)—founded on Galaxy workflow engine, Common Workflow Language, Research Object Crates (RO-Crate) and WorkflowHub technologies. The SDR can be applied to specimens’ labels and other artefacts, offering the prospect of greatly accelerated and more accurate data capture in computable form. Two kinds of FAIR Digital Objects (FDO) are created by packaging outputs of SDR workflows and workflow components as digital objects with metadata, a persistent identifier, and a specific type definition. The first kind of FDO are computable Digital Specimen (DS) objects that can be consumed/produced by workflows, and other applications. A single DS is the input data structure submitted to a workflow that is modified by each workflow component in turn to produce a refined DS at the end. The Specimen Data Refinery provides a library of such components that can be used individually, or in series. To cofunction, each library component describes the fields it requires from the DS and the fields it will in turn populate or enrich. The second kind of FDO, RO-Crates gather and archive the diverse set of digital and real-world resources, configurations, and actions (the provenance) contributing to a unit of research work, allowing that work to be faithfully recorded and reproduced. Here we describe the Specimen Data Refinery with its motivating requirements, focusing on what is essential in the creation of canonical workflow component libraries and its conformance with the requirements of an emerging FDO Core Specification being developed by the FDO Forum.

Authors: Alex Hardisty, Paul Brack, Carole Goble, Laurence Livermore, Ben Scott, Quentin Groom, Stuart Owen, Stian Soiland-Reyes

Date Published: 7th Mar 2022

Publication Type: Journal Article

Abstract (Expand)

Indicators of habitat condition are essential for tracking conservation progress, but measuring biotic, abiotic and landscape characteristics at fine resolution over large spatial extents remains challenging. In this viewpoint article, we provide a comprehensive synthesis of the challenges and solutions for consistently measuring and monitoring habitat condition with remote sensing using airborne Light Detection and Ranging (LiDAR) and affordable Unmanned Aerial Vehicles (UAVs) over multiple sites and transnational or continental extents. Key challenges include variability in sensor characteristics and survey designs, non-transparent pre-processing workflows, heterogeneous and complex data, issues with the robustness of metrics and indices, limited model generalizability and transferability across sites, and difficulties in handling big data, such as managing large volumes and utilizing parallel or distributed computing. We suggest that a collaborative cloud virtual research environment (VRE) for habitat condition research and monitoring could provide solutions, including tools for data discovery, access, and data standardization, as well as geospatial processing workflows for airborne LiDAR and UAV data. A VRE would also improve data management, metadata standardization, workflow reproducibility, and transferability of structure-from-motion algorithms and machine learning models such as random forests and convolutional neural networks. Along with best practices for data collection and adopting FAIR (findability, accessibility, interoperability, reusability) principles and open science practices, a VRE could enable more consistent and transparent data processing and metric retrieval, e.g., for Natura 2000 habitats. Ultimately, these improvements would support the development of more reliable habitat condition indicators, helping prevent habitat degradation and promoting the sustainable use of natural resources.

Authors: W. Daniel Kissling, Yifang Shi, Jinhu Wang, Agata Walicka, Charles George, Jesper E. Moeslund, France Gerard

Date Published: 1st Dec 2024

Publication Type: Journal Article

Abstract (Expand)

vadr database of Porcine CircoVirus (handle PCV 1, 2, 3 and 4 complete genomes available the 2024/04/02 YYYY/MM/DD on NCBI nt db, computed with vadr 1.6.4) vadr is an annotation program for viruses,, based on models computed on known viruses this database can be used for example to analyse pcv variants using vadr and vardict-java results by using vvv2_display tool.  

Author: Fabrice Touzain

Date Published: 2025

Publication Type: Dataset

Abstract (Expand)

Background: Next-generation sequencing (NGS) analysis of viral samples generates results dispersed across multiple files—genome assembly, variant calling, and functional annotations—making integrated interpretation challenging. Variants often yield numerous low-frequency or non-significant variants, yet only a small fraction are biologically relevant. Virologists must manually sift through extensive data to identify meaningful mutations, a time-consuming and error-prone process. To address these practical challenges, we developed vvv2_display, a dedicated summarization and visualization tool, integrated within comprehensive Galaxy workflows. Results: vvv2_display streamlines variant interpretation by consolidating key results into two concise and interoperable outputs. The first output is a PNG image showing alignment coverage depth and genomic annotations, with significant variants displayed along the genome as symbols whose height reflects frequency and shape indicates the affected protein. At a glance, this enables virologists to identify all deviations from a reference viral genome. Each significant variant is assigned a unique identifier that directly links to the second output: a tab-separated (TSV) text file listing only high-confidence variants, with frequencies, flanking nucleotides, and impacted genes and proteins. This cross-referenced design supports rapid, accurate, and intuitive data exploration. Availability: vvv2_display is open source, available on Github and installable via Mamba.

Authors: Alexandre Flageul, Edouard Hirchaud, Céline Courtillon, Flora Carnet, Paul Brown, Béatrice Grasland, Fabrice Touzain

Date Published: 17th Oct 2025

Publication Type: Journal Article

Abstract

Explain vvv2_display command line program and its inputs/outputs, available as a bioinconda package. Describes workflows using this program in Galaxy.

Authors: Alexandre Flageul, Edouard Hirchaud, Céline Courtillon, Flora Carnet, Paul Brown, Béatrice Grasland, Fabrice Touzain

Date Published: 2025

Publication Type: Other

Abstract (Expand)

The rising popularity of computational workflows is driven by the need for repetitive and scalable data processing, sharing of processing know-how, and transparent methods. As both combined records of analysis and descriptions of processing steps, workflows should be reproducible, reusable, adaptable, and available. Workflow sharing presents opportunities to reduce unnecessary reinvention, promote reuse, increase access to best practice analyses for non-experts, and increase productivity. In reality, workflows are scattered and difficult to find, in part due to the diversity of available workflow engines and ecosystems, and because workflow sharing is not yet part of research practice. WorkflowHub provides a unified registry for all computational workflows that links to community repositories, and supports both the workflow lifecycle and making workflows findable, accessible, interoperable, and reusable (FAIR). By interoperating with diverse platforms, services, and external registries, WorkflowHub adds value by supporting workflow sharing, explicitly assigning credit, enhancing FAIRness, and promoting workflows as scholarly artefacts. The registry has a global reach, with hundreds of research organisations involved, and more than 800 workflows registered.

Authors: Ove Johan Ragnar Gustafsson, Sean R. Wilkinson, Finn Bacall, Stian Soiland-Reyes, Simone Leo, Luca Pireddu, Stuart Owen, Nick Juty, José M. Fernández, Tom Brown, Hervé Ménager, Björn Grüning, Salvador Capella-Gutierrez, Frederik Coppens, Carole Goble

Date Published: 1st Dec 2025

Publication Type: Journal Article

Powered by
(v.1.17.3)
Copyright © 2008 - 2026 The University of Manchester and HITS gGmbH