Publications

What is a Publication?
26 Publications visible to you, out of a total of 26

Abstract (Expand)

Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.

Authors: Simone Leo, Michael R. Crusoe, Laura Rodríguez-Navas, Raül Sirvent, Alexander Kanitz, Paul De Geest, Rudolf Wittner, Luca Pireddu, Daniel Garijo, José M. Fernández, Iacopo Colonnelli, Matej Gallo, Tazro Ohta, Hirotaka Suetake, Salvador Capella-Gutierrez, Renske de Wit, Bruno P. Kinoshita, Stian Soiland-Reyes

Date Published: 10th Sep 2024

Publication Type: Journal Article

Abstract (Expand)

Background The covid-19 pandemic brought negative impacts in almost every country in the world. These impacts were observed mainly in the public health sphere, with a rapid raise and spread of the disease and failed attempts to restrain it while there was no treatment. However, in developing countries, the impacts were severe in other aspects such as the intensification of social inequality, poverty and food insecurity. Specifically in Brazil, the miscommunication among the government layers conducted the control measures to a complete chaos in a country of continental dimensions. Brazil made an effort to register granular informative data about the case reports and their outcomes, while this data is available and can be consumed freely, there are issues concerning the integrity and inconsistencies between the real number of cases and the number of notifications in this dataset. Results We projected and implemented four types of analysis to explore the Brazilian public dataset of Severe Acute Respiratory Syndrome (srag dataset) notifications and the google dataset of community mobility change (mobility dataset). These analysis provides some diagnosis of data integration issues and strategies to integrate data and experimentation of surveillance analysis. The first type of analysis aims at describing and exploring the data contained in both datasets, starting by assessing the data quality concerning missing data, then summarizing the patterns found in this datasets. The Second type concerns an statistical experiment to estimate the cases from mobility patterns organized in periods of time. We also developed, as the third analysis type, an algorithm to help the understanding of the disease waves by detecting them and compare the time periods across the cities. Lastly, we build time series datasets considering deaths, overall cases and residential mobility change in regular time periods and used as features to group cities with similar behavior. Conclusion The exploratory data analysis showed the under representation of covid-19 cases in many small cities in Brazil that were absent in the srag dataset or with a number of cases very low than real projections. We also assessed the availability of data for the Brazilian cities in the mobility dataset in each state, finding out that not all the states were represented and the best coverage occurred in Rio de Janeiro state. We compared the capacity of place categories mobility change combination on estimating the number of cases measuring the errors and identifying the best components in mobility that could affect the cases. In order to target specific strategies for groups of cities, we compared strategies to cluster cities that obtained similar outcomes behavior along the time, highlighting the divergence on handling the disease.

Authors: Yasmmin Côrtes Martins, Ronaldo Francisco da Silva

Date Published: 27th Sep 2023

Publication Type: Journal Article

Abstract (Expand)

Motivation The identification of the most important mutations, that lead to a structural and functional change in a highly transmissible virus variants, is essential to understand the impacts and the possible chances of vaccine and antibody escape. Strategies to rapidly associate mutations to functional and conformational properties are needed to rapidly analyze mutations in proteins and their impacts in antibodies and human binding proteins. Results Comparative analysis showed the main structural characteristics of the essential mutations found for each variant of concern in relation to the reference proteins. The paper presented a series of methodologies to track and associate conformational changes and the impacts promoted by the mutations.

Authors: Yasmmin Martins, Ronaldo Francisco da Silva

Date Published: 22nd Jun 2023

Publication Type: Journal Article

Abstract (Expand)

vadr database of Porcine CircoVirus (handle PCV 1, 2, 3 and 4 complete genomes available the 2024/04/02 YYYY/MM/DD on NCBI nt db, computed with vadr 1.6.4) vadr is an annotation program for viruses,, based on models computed on known viruses this database can be used for example to analyse pcv variants using vadr and vardict-java results by using vvv2_display tool.  

Author: Fabrice Touzain

Date Published: 2025

Publication Type: Dataset

Abstract (Expand)

Background: Next-generation sequencing (NGS) analysis of viral samples generates results dispersed across multiple files—genome assembly, variant calling, and functional annotations—making integrated interpretation challenging. Variants often yield numerous low-frequency or non-significant variants, yet only a small fraction are biologically relevant. Virologists must manually sift through extensive data to identify meaningful mutations, a time-consuming and error-prone process. To address these practical challenges, we developed vvv2_display, a dedicated summarization and visualization tool, integrated within comprehensive Galaxy workflows. Results: vvv2_display streamlines variant interpretation by consolidating key results into two concise and interoperable outputs. The first output is a PNG image showing alignment coverage depth and genomic annotations, with significant variants displayed along the genome as symbols whose height reflects frequency and shape indicates the affected protein. At a glance, this enables virologists to identify all deviations from a reference viral genome. Each significant variant is assigned a unique identifier that directly links to the second output: a tab-separated (TSV) text file listing only high-confidence variants, with frequencies, flanking nucleotides, and impacted genes and proteins. This cross-referenced design supports rapid, accurate, and intuitive data exploration. Availability: vvv2_display is open source, available on Github and installable via Mamba.

Authors: Alexandre Flageul, Edouard Hirchaud, Céline Courtillon, Flora Carnet, Paul Brown, Béatrice Grasland, Fabrice Touzain

Date Published: 17th Oct 2025

Publication Type: Journal Article

Abstract

Explain vvv2_display command line program and its inputs/outputs, available as a bioinconda package. Describes workflows using this program in Galaxy.

Authors: Alexandre Flageul, Edouard Hirchaud, Céline Courtillon, Flora Carnet, Paul Brown, Béatrice Grasland, Fabrice Touzain

Date Published: 2025

Publication Type: Other

Powered by
(v.1.17.3)
Copyright © 2008 - 2026 The University of Manchester and HITS gGmbH