(Note: This text is adopted from SciLifeLab Communications, read the story at SciLifeLab News)
One of the most common limitations when performing genomic studies on endangered species is that they have seldom been widely studied, unlike “model species” that are easy to maintain and breed in a laboratory setting. “A reference genome is like the picture of a puzzle you look at to be able to reconstruct in which place to put each piece. If you don’t have the reference picture, reconstructing the puzzle is impossible. It’s the same with the reference genomes (picture) and the sequencing data (puzzle pieces)”, says last author David Díez-del-Molino from the Swedish Museum of Natural History.
In order to tackle this problem, a group of bioinformatics experts from the SciLifeLab National Bioinformatics Infrastructure, NBIS, and researchers at the Swedish Museum of Natural History and Stockholm University at the Centre for Paleogenetics developed a bioinformatics pipeline named GenErode. Their report is now published in BMC Bioinformatics. “GenErode is a bioinformatics pipeline designed to process and analyze ancient, historical, and modern genomic data from endangered species in order to produce comparable estimates of genomic erosion indices”, he continues.
These are estimates of different genomic patterns, such as genomic diversity, inbreeding, and the number of damaging mutations in each genome. Having these estimates can be hugely relevant for conservation experts, since they impact the opportunities of threatened species to adapt to present or future environmental changes.
With minimal command line usage, the pipeline can then map the sequencing data to the reference genome to reconstruct the genomes from the two samples, performing quality filtering and quality control on the way. It also prepares the data for downstream analyzes, which include multiple reports and comparable estimations of the genomic erosion indices per sample. “Depending on the storage and computational power available to the user, as well as the size of the genome of the target species, GenErode can be run on dozens of samples at the time. Importantly, GenErode does a great job at keeping reports and logs of settings used in past runs, which should help with reproducibility”, he says.
There are multiple pipelines available that can be used to process genomic data from modern samples, as well as pipelines for ancient DNA data. “To our knowledge, this is the only pipeline aimed at specifically processing and analyzing genomic data from endangered species. The uniqueness of GenErode comes from the fact that you can process both kinds of data, historical and modern in the same pipeline. Since the aim is to make the downstream analyses, comparable between samples from different periods, our pipeline processes the data accordingly”, says David Díez-del-Molino.
Today, there are multiple large-scale projects, such as the Darwin Tree of Life (DTOL), the Vertebrate Genomes Project (VGP), and the European Reference Genome Atlas (ERGA) that are set to generate reference genomes for all eukaryotes. “This has the potential to transform the genomics research for endangered species and we predict that an increasing number of genomic projects on endangered species will make use of these resources. I think bioinformatics pipelines such as GenErode can be at the center of such a movement”, David says.
GenErode: a bioinformatics pipeline to investigate genome erosion in endangered and extinct species (Kutschera et al., 2022), DOI: 10.1186/s12859-022-04757-0