This workflow extracts protein-coding sequences from whole genome sequencing (WGS) data obtained from the European Nucleotide Archive (ENA). It automates the preprocessing, annotation, and selection of relevant protein sequences using tools such as Prokka, FASTA-to-Tabular, and pattern-based selection. The resulting dataset supports downstream analyses including comparative genomics, phylogenetics, and functional annotation.