Genetic information from phased SNP array data can improve assemblies of whole genome sequences
Whole genome sequence (WGS) assemblies for horticultural crops are a valuable resource to improve our understanding of key horticultural traits. The last decade has seen a rise in the availability, quality and use of WGSs, yet issues with aligning contigs and resolving haplotypes remain. SNP arrays have also become commonly available and large data sets of high-quality, phased, SNP genotypic data have been generated. These data sets contain information on linkages among SNPs, allele presence in germplasm individuals, and allele germplasm origins and could therefore be a valuable additional resource to improve WGS assemblies but are not fully exploited yet. To evaluate the quality of the haplotype-resolved WGS of Gala and to demonstrate how SNP array data can contribute to WGS assemblies, phased Gala SNP array data were compared to the Gala WGSs. Genomic positions for the 8K SNP array SNPs were determined for each of the reported haplomes of Gala. Then, SNP genotypes of the Gala SNP array data were compared with those of the Gala WGS and parental origin was assigned to SNP alleles in each haplome. Each Gala haplome was expected to exclusively contain either maternal or paternal haplotypes, yet all haplome homologs of each chromosome were composed of both. Multiple SNP genotype differences were observed, with either one of the expected parental alleles missing or the presence of an additional allele in a haplome. These results indicate that some Gala WGS contigs had been misassembled and that maternal and paternal haplotypes had not originally been resolved at the chromosome level. We propose that available high-quality phased SNP array data, pedigree records, and extended shared haplotypes among individuals, arising from application of inheritance-based genetics principles, are employed to improve WGS assemblies.
Vanderzande, S. and Peace, C. (2023). Genetic information from phased SNP array data can improve assemblies of whole genome sequences. Acta Hortic. 1362, 81-88
haplotype, shared segments, pan-genome, Malus × domestica, apple