Transcript- and annotation-guided genome assembly of the European starling
Description
The European starling, Sturnus vulgaris, is an ecologically significant, globally invasive avian species that is also suffering from a major decline in its native range. Here, we present the genome assembly and long-read transcriptome of an Australian-sourced European starling (S. vulgaris vAU), and a second North American genome (S. vulgaris vNA), as complementary reference genomes for population genetic and evolutionary characterisation. S. vulgaris vAU combined 10x Genomics linked-reads, low-coverage Nanopore sequencing, and PacBio Iso-Seq full-length transcript scaffolding to generate a 1050 Mb assembly on 1,628 scaffolds (72.5 Mb scaffold N50). Species-specific transcript mapping and gene annotation revealed high structural and functional completeness (94.6% BUSCO completeness). Further scaffolding against the high-quality zebra finch (Taeniopygia guttata) genome assigned 98.6% of the assembly to 32 putative nuclear chromosome scaffolds. Rapid, recent advances in sequencing technologies and bioinformatics software have highlighted the need for evidence-based assessment of assembly decisions on a case-by-case basis. Using S. vulgaris vAU, we demonstrate how the multifunctional use of PacBio Iso-Seq transcript data and complementary homology-based annotation of sequential assembly steps (assessed using a new tool, SAAGA) can be used to assess, inform, and validate assembly workflow decisions. We also highlight some counter-intuitive behaviour in traditional BUSCO metrics, and present BUSCOMP, a complementary tool for assembly comparison designed to be robust to differences in assembly size and base-calling quality. Finally, we present a second starling assembly, S. vulgaris vNA, to facilitate comparative analysis and global genomic research on this ecologically important species.
Publication Date
1-1-2022
Publisher
DRYAD
DOI
10.5061/dryad.02v6wwq5z
Funder
Australian Research Council,Australian Research Council,Human Sciences Frontier Programme*,Roslin Institute Strategic Grant*,UNSW Scientia Fellowship*,
Language
en
Document Type
Data Set
Recommended Citation
Cheng, Yuanyuan; Clayton, David; Rollins, Lee; Ball, Gregory; De Meyer, Tim; Stuart, Katarina; Burt, Dave; Bateson, Melissa; Brandley, Matthew; Meddle, Simone; Cassey, Phillip; Sherwin, William; Werner, Scott; Edwards, Richard; Buchanan, Katherine; Hofmeister, Natalie; Warren, Wes (2022), "Transcript- and annotation-guided genome assembly of the European starling", DRYAD, doi: 10.5061/dryad.02v6wwq5z
https://doi.org/10.5061/dryad.02v6wwq5z
Identifier
10.5061/dryad.02v6wwq5z
Embargo Date
1-1-2022
Version
5