Publications

66 Publications visible to you, out of a total of 66

The Specimen Data Refinery: A Canonical Workflow Framework and FAIR Digital Object Approach to Speeding up Digital Mobilisation of Natural History Collections

Specimen Data Refinery, FAIR Computational Workflows

(Show All)

Abstract (Expand)

A key limiting factor in organising and using information from physical specimens curated in natural science collections is making that information computable, with institutional digitization tending … to focus more on imaging the specimens themselves than on efficiently capturing computable data about them. Label data are traditionally manually transcribed today with high cost and low throughput, rendering such a task constrained for many collection-holding institutions at current funding levels. We show how computer vision, optical character recognition, handwriting recognition, named entity recognition and language translation technologies can be implemented into canonical workflow component libraries with findable, accessible, interoperable, and reusable (FAIR) characteristics. These libraries are being developed in a cloud- based workflow platform—the ‘Specimen Data Refinery’ (SDR)—founded on Galaxy workflow engine, Common Workflow Language, Research Object Crates (RO-Crate) and WorkflowHub technologies. The SDR can be applied to specimens’ labels and other artefacts, offering the prospect of greatly accelerated and more accurate data capture in computable form. Two kinds of FAIR Digital Objects (FDO) are created by packaging outputs of SDR workflows and workflow components as digital objects with metadata, a persistent identifier, and a specific type definition. The first kind of FDO are computable Digital Specimen (DS) objects that can be consumed/produced by workflows, and other applications. A single DS is the input data structure submitted to a workflow that is modified by each workflow component in turn to produce a refined DS at the end. The Specimen Data Refinery provides a library of such components that can be used individually, or in series. To cofunction, each library component describes the fields it requires from the DS and the fields it will in turn populate or enrich. The second kind of FDO, RO-Crates gather and archive the diverse set of digital and real-world resources, configurations, and actions (the provenance) contributing to a unit of research work, allowing that work to be faithfully recorded and reproduced. Here we describe the Specimen Data Refinery with its motivating requirements, focusing on what is essential in the creation of canonical workflow component libraries and its conformance with the requirements of an emerging FDO Core Specification being developed by the FDO Forum.

Authors: Alex Hardisty, Paul Brack, Carole Goble, Laurence Livermore, Ben Scott, Quentin Groom, Stuart Owen, Stian Soiland-Reyes

Date Published: 7th Mar 2022

Publication Type: Journal Article

DOI: 10.1162/dint_a_00134

Citation: Data Intelligence:1-19

Created: 25th Apr 2022 at 11:45, Last updated: 16th Jan 2023 at 13:34

Towards consistently measuring and monitoring habitat condition with airborne laser scanning and unmanned aerial vehicles

Laserfarm applications to European demonstration sites

(Show All)

Abstract (Expand)

Indicators of habitat condition are essential for tracking conservation progress, but measuring biotic, abiotic and landscape characteristics at fine resolution over large spatial extents remains …

Authors: W. Daniel Kissling, Yifang Shi, Jinhu Wang, Agata Walicka, Charles George, Jesper E. Moeslund, France Gerard

Date Published: 1st Dec 2024

Publication Type: Journal Article

DOI: 10.1016/j.ecolind.2024.112970

Citation: Ecological Indicators 169:112970

Created: 7th Feb 2025 at 08:44, Last updated: 24th Apr 2025 at 15:53

vadr database of Porcine CircoVirus (handle PCV 1, 2, 3 and 4 complete genomes available the 2023/07/12 YYYY/MM/DD on NCBI nt db, computed with vadr 1.6.4)

ANSES-Ploufragan

Abstract (Expand)

vadr database of Porcine CircoVirus (handle PCV 1, 2, 3 and 4 complete genomes available the 2024/04/02 YYYY/MM/DD on NCBI nt db, computed with vadr 1.6.4) vadr is an annotation program for viruses, …

Author: Fabrice Touzain

Date Published: 2025

Publication Type: Dataset

DOI: 10.5281/zenodo.15065124

Citation: Zenodo. https://zenodo.org/doi/10.5281/zenodo.15065124.

Created: 1st Apr 2026 at 17:09, Last updated: 1st Apr 2026 at 17:11

vvv2_align_SE, vvv2_align_PE/vvv2_display: Galaxy-Based Workflows and Tool Designed to Perform, Summarize and Visualize Variant Calling and Annotation in Viral Genome Assemblies

ANSES-Ploufragan

Abstract (Expand)

Background: Next-generation sequencing (NGS) analysis of viral samples generates results dispersed across multiple files—genome assembly, variant calling, and functional annotations—making integrated …

Authors: Alexandre Flageul, Edouard Hirchaud, Céline Courtillon, Flora Carnet, Paul Brown, Béatrice Grasland, Fabrice Touzain

Date Published: 17th Oct 2025

Publication Type: Journal Article

DOI: 10.3390/v17101385

Citation: Viruses 17(10):1385.

Created: 1st Apr 2026 at 17:00

vvv2_display: variant calling pipeline summarized in two outputs, png image of variants (proportion, annotation and coverage depth), tsv file of significant variants information

ANSES-Ploufragan

Abstract

Explain vvv2_display command line program and its inputs/outputs, available as a bioinconda package. Describes workflows using this program in Galaxy.

Authors: Alexandre Flageul, Edouard Hirchaud, Céline Courtillon, Flora Carnet, Paul Brown, Béatrice Grasland, Fabrice Touzain

Date Published: 2025

Publication Type: Other

DOI: 10.5281/zenodo.16918392

Citation: Zenodo. https://zenodo.org/doi/10.5281/zenodo.16918392.

Created: 1st Apr 2026 at 17:04

WorkflowHub: a registry for computational workflows

FAIR Computational Workflows, EuroScienceGateway

(Show All)

Abstract (Expand)

The rising popularity of computational workflows is driven by the need for repetitive and scalable data processing, sharing of processing know-how, and transparent methods. As both combined records of …

Authors: Ove Johan Ragnar Gustafsson, Sean R. Wilkinson, Finn Bacall, Stian Soiland-Reyes, Simone Leo, Luca Pireddu, Stuart Owen, Nick Juty, José M. Fernández, Tom Brown, Hervé Ménager, Björn Grüning, Salvador Capella-Gutierrez, Frederik Coppens, Carole Goble

Date Published: 1st Dec 2025

Publication Type: Journal Article

DOI: 10.1038/s41597-025-04786-3

Citation: Sci Data 12(1),837

Created: 3rd Oct 2025 at 17:48, Last updated: 3rd Oct 2025 at 17:49

Publications

Filters ×

Filters