Genetic information from phased SNP array data can improve assemblies of whole genome sequences

S. Vanderzande, C. Peace
Whole genome sequence (WGS) assemblies for horticultural crops are a valuable resource to improve our understanding of key horticultural traits. The last decade has seen a rise in the availability, quality and use of WGSs, yet issues with aligning contigs and resolving haplotypes remain. SNP arrays have also become commonly available and large data sets of high-quality, phased, SNP genotypic data have been generated. These data sets contain information on linkages among SNPs, allele presence in germplasm individuals, and allele germplasm origins and could therefore be a valuable additional resource to improve WGS assemblies – but are not fully exploited yet. To evaluate the quality of the haplotype-resolved WGS of ‘Gala’ and to demonstrate how SNP array data can contribute to WGS assemblies, phased ‘Gala’ SNP array data were compared to the ‘Gala’ WGSs. Genomic positions for the 8K SNP array SNPs were determined for each of the reported haplomes of ‘Gala’. Then, SNP genotypes of the ‘Gala’ SNP array data were compared with those of the ‘Gala’ WGS and parental origin was assigned to SNP alleles in each haplome. Each ‘Gala’ haplome was expected to exclusively contain either maternal or paternal haplotypes, yet all haplome homologs of each chromosome were composed of both. Multiple SNP genotype differences were observed, with either one of the expected parental alleles missing or the presence of an additional allele in a haplome. These results indicate that some ‘Gala’ WGS contigs had been misassembled and that maternal and paternal haplotypes had not originally been resolved at the chromosome level. We propose that available high-quality phased SNP array data, pedigree records, and extended shared haplotypes among individuals, arising from application of inheritance-based genetics principles, are employed to improve WGS assemblies.
Vanderzande, S. and Peace, C. (2023). Genetic information from phased SNP array data can improve assemblies of whole genome sequences. Acta Hortic. 1362, 81-88
DOI: 10.17660/ActaHortic.2023.1362.12
haplotype, shared segments, pan-genome, Malus × domestica, apple

Acta Horticulturae