Publications

What is a Publication?
66 Publications visible to you, out of a total of 66

Abstract (Expand)

Description The Workflowhub Knowledge Graph has been improved and its generation made more robust. When this work was last reported, a complete knowledge graph had been generated but several criticismsms were made. The previous graph was: - Verbose and hard for a human to read or navigate - Had unresolvable URIs as root data entities - Contained many duplicate entries - Contained sparse metadata from only a single source Work has successfully been undertaken to address all of these points. The graph now uses partially resolvable, more human readable, URIs for root data entities. Steps have been added to the generation software to add metadata from additional sources (enrichment) and to remove duplicate entries (consolidation). Several areas of the codebase have been refactored and improved, to help ensure repeatability and longevity. The new knowledge graph still has areas that could be improved. Partially resolvable URIs should be migrated to fully resolvable alternatives. Further enrichment processes should be added which affords greater de-duplication.

Authors: Eli Chadwick, Oliver Woolland, Volodymyr Savchenko, Finn Bacall, Alexander Hambley, José María Fernández González, Armin Dadras, Stian Soiland-Reyes

Date Published: 1st Aug 2025

Publication Type: Report

Abstract

Not specified

Authors: João Vitor F. Cavalcante, Iara Dantas de Souza, Diego A. A. Morais, Rodrigo J. S. Dalmolin

Date Published: 27th Aug 2024

Publication Type: Conference Paper

Abstract (Expand)

Computational workflows describe the complex multi-step methods that are used for data collection, data preparation, analytics, predictive modelling, and simulation that lead to new data products. They can inherently contribute to the FAIR data principles: by processing data according to established metadata; by creating metadata themselves during the processing of data; and by tracking and recording data provenance. These properties aid data quality assessment and contribute to secondary data usage. Moreover, workflows are digital objects in their own right. This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps, their provenance, and their development.

Authors: Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, Daniel Schober

Date Published: 2020

Publication Type: Journal Article

Abstract

Not specified

Authors: Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, Daniel Schober

Date Published: 2020

Publication Type: Journal Article

Abstract (Expand)

Description This deliverable provides the final project summary of EuroScienceGateway (ESG), a Horizon Europe and EOSC initiative (Grant Agreement 101057388, Sept 2022–Aug 2025) coordinated byy Albert-Ludwigs-Universität Freiburg. It summarizes ESG’s main achievements, impacts, FAIR data management, sustainability and exploitation plans, and dissemination outcomes. Technically, ESG delivered a production-grade, federated research gateway built on Galaxy and an expanded Pulsar Network, enabling scalable, data-intensive analysis across heterogeneous European compute and storage. Key innovations include Bring-Your-Own-Compute/Storage (BYOC/BYOS), a smart meta-scheduler (TPV Broker), Galaxy Job Radar dashboard, and streamlined deployment/admin tooling—altogether improving throughput, data locality, and operational transparency. The project operationalized FAIR principles for computational workflows by packaging and publishing Workflow RO-Crates with persistent identifiers via WorkflowHub, advancing EOSC interoperability. Federated AAI (e.g., EGI Check-in, LS Login, IAM4NFDI) supports secure access across institutions. ESG contributed >20 workflows, >40 tutorials, and >10 peer-reviewed publications, and collaborated with 20+ initiatives. Six national Galaxy instances and 10+ Pulsar endpoints were launched; the European Galaxy instance achieved ISO/IEC 27001 certification. Community impact was substantial: registered users on the European Galaxy portal grew from ~30,000 to >130,000, with monthly actives doubling to >6,000, underpinned by >20 online/onsite workshops and large-scale training through the Galaxy Training Network and Training-Infrastructure-as-a-Service (TIaaS). Sustainability is ensured through distributed governance, national/institutional hosting of Galaxy/Pulsar services, continued curation of workflows and training materials, and alignment with EOSC service models and funding pathways. The report closes with exploitation routes for beneficiaries and stakeholders and a record of dissemination and outreach activities across the European research ecosystem.

Authors: Armin Dadras, Oana Kaiser, Björn Grüning, Sebastian Luna-Valero, Enol Fernandez-del-Castillo

Date Published: 20th Aug 2025

Publication Type: Report

Abstract (Expand)

No presente artigo é apresentado uma avaliação de desempenho de um Framework de Redes Filogenéticas no ambiente do supercomputador Santos Dumont. O trabalho reforça os benefícios de paralelizar o framework usando abordagens paralelas baseadas em Computação de Alta Vazão (CAV), e Computação de Alto Desempenho (CAD). Os resultados da execução paralela do framework proposto, demonstram que este tipo de experimento da bioinformática é apropriado para ser executado em ambientes de CAD; apesar de que nem todas as tarefas e programas componentes do framework tenham sido criados para usufruir de escalabilidade em ambientes de CAD, ou de técnicas de paralelismo em diferentes níveis. A análise comparativa da execução dos cinco pipelines de forma sequencial (como desenhado e usado originalmente por bioinformatas) apresentou um tempo estimado de 81, 67 minutos. Já a execução do mesmo experimento por meio do framework executa os cinco pipelines de forma paralela e usufruindo de um melhor gerenciamento das tarefas, gerando um tempo total de execução de 38,73 minutos. Essa melhora é de aproximadamente 2, 11 vezes em tempo de execução sugere que a utilização de um framework otimizado leva à diminuição do tempo computacional, à melhora de alocação de recursos e ao tempo de espera na alocação.

Authors: Rafael Terra, Kary Ocaña, Carla Osthoff, Lucas Cruz, Philippe Navaux, Diego Carvalho

Date Published: 19th Oct 2022

Publication Type: Conference Paper

Abstract (Expand)

In the last years, the development of technologies, such as next-generation sequencing and high-performance computing allowed the execution of Bioinformatics experiments of high complexity and computationally intensives. Different Bioinformatics fields need to use high-performance computing platforms to take advantage of the parallelism and tasks distribution, through specialized technologies of scientific workflows management systems. One of the Bioinformatics fields that need high-performance computing is phylogeny, a field that expresses the evolutive relations between genes and organisms, establishing which of them are most related evolutively. The phylogeny is used in several approaches, such as in the species classification; in the discovery of individuals’ kinship; in the identification of pathogens origins, and even in conservation biology. A way of representing these phylogenetic relations is using phylogenetic networks. However, the construction of these networks uses computationally intensive algorithms that require the constant manipulation of different input data. This work aims the development of a framework for construction of explicit phylogenetic networks, modeling a scientific workflow that adds different methods for the construction of the networks and the required input data treatment. The framework was developed to allow the use of multiple flows from the workflow in an automated, parallel, and distributed manner in a single execution and also to be executable in high- performance computing environments, constituting a challenging task, once the tools used are not developed focused in this environment. To orchestrate the workflow tasks, the scalable parallel programing library Parsl was used, allowing to do optimizations in the workflow’s tasks execution, performing better management of the resources. Two versions of the framework were developed, called Single Partition and Multi Partition, differing in the manner in which the resources are used. In tests performed, there was an improvement in the execution time of about five times when compared to the sequential execution of a flow without the optimizations. The framework was validated using public data of Dengue virus genomes, which were processed, annotated, and executed in the framework using the Santos Dumont supercomputer. The construction of the genomes’ explicit phylogenetic networks indicates that the framework is a functional, efficient, and easy to use tool.

Authors: Rafael Terra, Kary Ocaña, Carla Osthoff, Diego Carvalho

Date Published: 18th Feb 2022

Publication Type: Master's Thesis

Abstract (Expand)

This report provides an in-depth analysis of the sustainability of the Galaxy platform, a globally recognized open-source system for data analysis, workflow management, and scientific collaboration. Developed under the EuroScienceGateway project and supported by the European Union’s Horizon Europe program (Grant Agreement No. 101057388), the report evaluates Galaxy through the lenses of desirability, feasibility, and viability using a robust analytical framework derived from design thinking and open-source community health metrics (CHAOSS). The report presents empirical data on Galaxy's rapid growth in user adoption, job execution volume, infrastructure robustness, contributor engagement, community governance, and scientific impact. It highlights Galaxy’s ability to democratize access to advanced computational tools, support reproducible science, and maintain long-term sustainability through a distributed community and institutional support. This document is a valuable resource for funders, policymakers, and stakeholders in the open science and digital research infrastructure community, illustrating why Galaxy represents a low-risk, high-reward investment in the future of data-driven research.

Author: Smitesh Jain

Date Published: 17th Jul 2025

Publication Type: Report

Abstract (Expand)

Processos evolutivos e dispersão de genomas de Dengue no Brasil são relevantes na direção do impacto e vigilância endemo-epidêmico e social de arboviroses emergentes. Árvores e redes filogenéticas permitem exibir eventos evolutivos e reticulados em vírus originados pela alta diversidade e taxa de mutação de recombinação homóloga frequente. Apresentamos um workflow científico paralelo e distribuído para redes filogenéticas desenhado para trabalhar com a diversidade de ferramentas e recursos em experimentos da biologia computacional e acoplados a ambientes de computação de alto desempenho. Apresentamos uma melhoria no tempo de execução de aproximadamente 5 vezes em comparação com a execução sequencial em análises de genomas de dengue e com identificação de eventos de recombinação.

Authors: Rafael Terra, Micaella Coelho, Lucas Cruz, Marco Garcia-Zapata, Luiz Gadelha, Carla Osthoff, Diego Carvalho, Kary Ocaña

Date Published: 18th Jul 2021

Publication Type: Conference Paper

Abstract (Expand)

Scientific workflows facilitate the automation of data analysis tasks by integrating various software and tools executed in a particular order. To enable transparency and reusability in workflows, it is essential to implement the FAIR principles. Here, we describe our experiences implementing the FAIR principles for metabolomics workflows using Metabolome Annotation Workflow (MAW) as a case study. MAW is specified using the Common Workflow Language (CWL), allowing for the subsequent execution of the workflow on different workflow engines. MAW is registered using CWL description on WorkflowHub with the DOI https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.510.2. During the submission process on WorkflowHub, a CWL description is used for packaging MAW using the Workflow RO-Crate profile, which includes metadata in Bioschemas. Researchers can use the instructions presented in this snapshot as a base template to adopt FAIR practices for their bioinformatics or cheminformatics workflows while incorporating necessary amendments specific to their research area.

Authors: Mahnoor Zulfiqar, Michael R. Crusoe, Birgitta König-Ries, Christoph Steinbeck, Kristian Peters, Luiz Gadelha

Date Published: 21st Dec 2023

Publication Type: Journal Article

Powered by
(v.1.17.3)
Copyright © 2008 - 2026 The University of Manchester and HITS gGmbH