Publications

What is a Publication?
26 Publications visible to you, out of a total of 26

Abstract (Expand)

Processos evolutivos e dispersão de genomas de Dengue no Brasil são relevantes na direção do impacto e vigilância endemo-epidêmico e social de arboviroses emergentes. Árvores e redes filogenéticas permitem exibir eventos evolutivos e reticulados em vírus originados pela alta diversidade e taxa de mutação de recombinação homóloga frequente. Apresentamos um workflow científico paralelo e distribuído para redes filogenéticas desenhado para trabalhar com a diversidade de ferramentas e recursos em experimentos da biologia computacional e acoplados a ambientes de computação de alto desempenho. Apresentamos uma melhoria no tempo de execução de aproximadamente 5 vezes em comparação com a execução sequencial em análises de genomas de dengue e com identificação de eventos de recombinação.

Authors: Rafael Terra, Micaella Coelho, Lucas Cruz, Marco Garcia-Zapata, Luiz Gadelha, Carla Osthoff, Diego Carvalho, Kary Ocaña

Date Published: 18th Jul 2021

Publication Type: Conference Paper

Abstract (Expand)

Scientific workflows facilitate the automation of data analysis tasks by integrating various software and tools executed in a particular order. To enable transparency and reusability in workflows, it is essential to implement the FAIR principles. Here, we describe our experiences implementing the FAIR principles for metabolomics workflows using Metabolome Annotation Workflow (MAW) as a case study. MAW is specified using the Common Workflow Language (CWL), allowing for the subsequent execution of the workflow on different workflow engines. MAW is registered using CWL description on WorkflowHub with the DOI https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.510.2. During the submission process on WorkflowHub, a CWL description is used for packaging MAW using the Workflow RO-Crate profile, which includes metadata in Bioschemas. Researchers can use the instructions presented in this snapshot as a base template to adopt FAIR practices for their bioinformatics or cheminformatics workflows while incorporating necessary amendments specific to their research area.

Authors: Mahnoor Zulfiqar, Michael R. Crusoe, Birgitta König-Ries, Christoph Steinbeck, Kristian Peters, Luiz Gadelha

Date Published: 21st Dec 2023

Publication Type: Journal Article

Abstract (Expand)

Scientific workflows facilitate the automation of data analysis tasks by integrating various software and tools executed in a particular order. To enable transparency and reusability in workflows, it is essential to implement the FAIR principles. Here, we describe our experiences implementing the FAIR principles for metabolomics workflows using the Metabolome Annotation Workflow (MAW) as a case study. MAW is specified using the Common Workflow Language (CWL), allowing for the subsequent execution of the workflow on different workflow engines. MAW is registered using a CWL description on WorkflowHub. During the submission process on WorkflowHub, a CWL description is used for packaging MAW using the Workflow RO-Crate profile, which includes metadata in Bioschemas. Researchers can use this narrative discussion as a guideline to commence using FAIR practices for their bioinformatics or cheminformatics workflows while incorporating necessary amendments specific to their research area.

Authors: Mahnoor Zulfiqar, Michael R. Crusoe, Birgitta König-Ries, Christoph Steinbeck, Kristian Peters, Luiz Gadelha

Date Published: 1st Feb 2024

Publication Type: Journal Article

Abstract (Expand)

Predicting the physical or functional associations through protein-protein interactions (PPIs) represents an integral approach for inferring novel protein functions and discovering new drug targets during repositioning analysis. Recent advances in high-throughput data generation and multi-omics techniques have enabled large-scale PPI predictions, thus promoting several computational methods based on different levels of biological evidence. However, integrating multiple results and strategies to optimize, extract interaction features automatically and scale up the entire PPI prediction process is still challenging. Most procedures do not offer an in-silico validation process to evaluate the predicted PPIs. In this context, this paper presents the PredPrIn scientific workflow that enables PPI prediction based on multiple lines of evidence, including the structure, sequence, and functional annotation categories, by combining boosting and stacking machine learning techniques. We also present a pipeline (PPIVPro) for the validation process based on cellular co-localization filtering and a focused search of PPI evidence on scientific publications. Thus, our combined approach provides means to extensive scale training or prediction of new PPIs and a strategy to evaluate the prediction quality. PredPrIn and PPIVPro are publicly available at https://github.com/YasCoMa/predprin and https://github.com/YasCoMa/ppi_validation_process.

Authors: Yasmmin Côrtes Martins, Artur Ziviani, Marisa Fabiana Nicolás, Ana Tereza Ribeiro de Vasconcelos

Date Published: 6th Sep 2021

Publication Type: Journal Article

Abstract (Expand)

A widely used standard for portable multilingual data analysis pipelines would enable considerable benefits to scholarly publication reuse, research/industry collaboration, regulatory cost control, and to the environment. Published research that used multiple computer languages for their analysis pipelines would include a complete and reusable description of that analysis that is runnable on a diverse set of computing environments. Researchers would be able to easier collaborate and reuse these pipelines, adding or exchanging components regardless of programming language used; collaborations with and within the industry would be easier; approval of new medical interventions that rely on such pipelines would be faster. Time will be saved and environmental impact would also be reduced, as these descriptions contain enough information for advanced optimization without user intervention. Workflows are widely used in data analysis pipelines, enabling innovation and decision-making for the modern society. In many domains the analysis components are numerous and written in multiple different computer languages by third parties. However, lacking a standard for reusable and portable multilingual workflows, then reusing published multilingual workflows, collaborating on open problems, and optimizing their execution would be severely hampered. Moreover, only a standard for multilingual data analysis pipelines that was widely used would enable considerable benefits to research-industry collaboration, regulatory cost control, and to preserving the environment. Prior to the start of the CWL project, there was no standard for describing multilingual analysis pipelines in a portable and reusable manner. Even today / currently, although there exist hundreds of single-vendor and other single-source systems that run workflows, none is a general, community-driven, and consensus-built standard. Preprint, submitted to Communications of the ACM (CACM).

Authors: Michael R. Crusoe, Sanne Abeln, Alexandru Iosup, Peter Amstutz, John Chilton, Nebojša Tijanić, Hervé Ménager, Stian Soiland-Reyes, Carole Goble

Date Published: 14th May 2021

Publication Type: Preprint

Abstract (Expand)

Abstract In silico variant interpretation pipelines have become an integral part of genetics research and genome diagnostics. However, challenges remain for automated variant interpretation and candidateant interpretation and candidate shortlisting. Their reliability is affected by variability in input data caused due the use of differing sequencing platforms, erroneous nomenclature and changing experimental conditions. Similarly, differences in predictive algorithms can result in discordant results. Finally, scalability is essential to accommodate large amounts of input data, such as in whole genome sequencing (WGS). To accelerate causal variant detection and innovation in genome diagnostics and research, we developed the MOLGENIS Variant Interpretation Pipeline (VIP). VIP is a flexible open-source computational pipeline that generates interactive reports of variants in whole exome sequencing (WES) and WGS data for expert interpretation. VIP can process short- and long-read data from different platforms and offers tools for increased sensitivity: a configurable decision-tree, filters based on human phenotype ontology (HPO) and gene inheritance that can be used to pinpoint disease-causing variants or finetune a query for specific variants. Here, alongside presenting VIP, we provide a step-by-step protocol for how to use VIP to annotate, classify and filter genetic variants of patients with a rare disease that has a suspected genetic cause. Finally, we demonstrate how VIP performs using 25,664 previously classified variants from the data sharing initiative of the Vereniging van Klinisch Genetische Laboratoriumdiagnostiek (VKGL), a cohort of 18 diagnosed patients from routine diagnostics and a cohort of 41 patients with a rare disease (RD) who were not diagnosed in routine diagnostics but were diagnosed using novel omics approaches within the EU-wide project to solve rare diseases (EU-Solve-RD). VIP requires bioinformatic knowledge to configure, but once configured, any diagnostic professional can perform an analysis within 5 hours.

Authors: W.T.K. Maassen, L.F. Johansson, B. Charbon, D. Hendriksen, S. van den Hoek, M.K. Slofstra, R. Mulder, M.T. Meems-Veldhuis, R. Sietsma, H.H. Lemmink, C.C. van Diemen, M.E. van Gijn, M.A. Swertz, K.J. van der Velde

Date Published: 15th Apr 2024

Publication Type: Preprint

Abstract (Expand)

Background There is an availability of omics and often multi-omics cancer datasets on public databases such as Gene Expression Omnibus (GEO), International Cancer Genome Consortium and The Cancer Genome Atlas Program. Most of these databases provide at least the gene expression data for the samples contained in the project. Multi-omics has been an advantageous strategy to leverage personalized medicine, but few works explore strategies to extract knowledge relying only on gene expression level for decisions on tasks such as disease outcome prediction and drug response simulation. The models and information acquired on projects based only on expression data could provide decision making background for future projects that have other level of omics data such as DNA methylation or miRNAs. Results We extended previous methodologies to predict disease outcome from the combination of protein interaction networks and gene expression profiling by proposing an automated pipeline to perform the graph feature encoding and further patient networks outcome classification derived from RNA-Seq. We integrated biological networks from protein interactions and gene expression profiling to assess patient specificity combining the treatment/control ratio with the patient normalized counts of the deferentially expressed genes. We also tackled the disease outcome prediction from the gene set enrichment perspective, combining gene expression with pathway gene sets information as features source for this task. We also explored the drug response outcome perspective of the cancer disease still evaluating the relationship among gene expression profiling with single sample gene set enrichment analysis (ssGSEA), proposing a workflow to perform drug response screening according to the patient enriched pathways. Conclusion We showed the importance of the patient network modeling for the clinical task of disease outcome prediction using graph kernel matrices strategy and showed how ssGSEA improved the prediction only using transcriptomic data combined with pathway scores. We also demonstrated a detailed screening analysis showing the impact of pathway-based gene sets and normalization types for the drug response simulation. We deployed two fully automatized Screening workflows following the FAIR principles for the disease outcome prediction and drug response simulation tasks.

Author: Yasmmin Martins

Date Published: 28th Sep 2023

Publication Type: Journal Article

Abstract (Expand)

The Linking Open Data (LOD) cloud is a global data space for publishing and linking structured data on the Web. The idea is to facilitate the integration, exchange, and processing of data. The LOD cloud already includes a lot of datasets that are related to the biological area. Nevertheless, most of the datasets about protein interactions do not use metadata standards. This means that they do not follow the LOD requirements and, consequently, hamper data integration. This problem has impacts on the information retrieval, specially with respect to datasets provenance and reuse in further prediction experiments. This paper proposes an ontology to describe and unite the four main kinds of data in a single prediction experiment environment: (i) information about the experiment itself; (ii) description and reference to the datasets used in an experiment; (iii) information about each protein involved in the candidate pairs. They correspond to the biological information that describes them and normally involves integration with other datasets; and, finally, (iv) information about the prediction scores organized by evidence and the final prediction. Additionally, we also present some case studies that illustrate the relevance of our proposal, by showing how queries can retrieve useful information.

Authors: Yasmmin Cortes Martins, Maria Cláudia Cavalcanti, Luis Willian Pacheco Arge, Artur Ziviani, Ana Tereza Ribeiro de Vasconcelos

Date Published: 2019

Publication Type: Journal Article

Abstract

Not specified

Authors: Anna-Lena Lamprecht, Magnus Palmblad, Jon Ison, Veit Schwämmle, Mohammad Sadnan Al Manir, Ilkay Altintas, Christopher J. O. Baker, Ammar Ben Hadj Amor, Salvador Capella-Gutierrez, Paulos Charonyktakis, Michael R. Crusoe, Yolanda Gil, Carole Goble, Timothy J. Griffin, Paul Groth, Hans Ienasescu, Pratik Jagtap, Matúš Kalaš, Vedran Kasalica, Alireza Khanteymoori, Tobias Kuhn, Hailiang Mei, Hervé Ménager, Steffen Möller, Robin A. Richardson, Vincent Robert, Stian Soiland-Reyes, Robert Stevens, Szoke Szaniszlo, Suzan Verberne, Aswin Verhoeven, Katherine Wolstencroft

Date Published: 2021

Publication Type: Journal Article

Abstract (Expand)

Semantic web standards have shown importance in the last 20 years in promoting data formalization and interlinking between the existing knowledge graphs. In this context, several ontologies and data integration initiatives have emerged in recent years for the biological area, such as the broadly used Gene Ontology that contains metadata to annotate gene function and subcellular location. Another important subject in the biological area is protein–protein interactions (PPIs) which have applications like protein function inference. Current PPI databases have heterogeneous exportation methods that challenge their integration and analysis. Presently, several initiatives of ontologies covering some concepts of the PPI domain are available to promote interoperability across datasets. However, the efforts to stimulate guidelines for automatic semantic data integration and analysis for PPIs in these datasets are limited. Here, we present PPIntegrator, a system that semantically describes data related to protein interactions. We also introduce an enrichment pipeline to generate, predict and validate new potential host–pathogen datasets by transitivity analysis. PPIntegrator contains a data preparation module to organize data from three reference databases and a triplification and data fusion module to describe the provenance information and results. This work provides an overview of the PPIntegrator system applied to integrate and compare host–pathogen PPI datasets from four bacterial species using our proposed transitivity analysis pipeline. We also demonstrated some critical queries to analyze this kind of data and highlight the importance and usage of the semantic data generated by our system.

Authors: Yasmmin Côrtes Martins, Artur Ziviani, Maiana de Oliveira Cerqueira e Costa, Maria Cláudia Reis Cavalcanti, Marisa Fabiana Nicolás, Ana Tereza Ribeiro de Vasconcelos

Date Published: 2023

Publication Type: Journal Article

Powered by
(v.1.17.3)
Copyright © 2008 - 2026 The University of Manchester and HITS gGmbH