Classifying single nucleotide polymorphisms in humans


Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation amongst the human population and are key to personalized medicine. New tests are presented to distinguish pathogenic/malign (i.e., likely to contribute to or cause a disease) from nonpathogenic/benign SNPs, regardless of whether they occur in coding (exon) or noncoding (intron) regions in the human genome. The tests are based on the nearest neighbor (NN) model of Gibbs free energy landscapes of DNA hybridization and on deep structural properties of DNA revealed by an approximating metric (the h-distance) in DNA spaces of oligonucleotides of a common size. The quality assessments show that the newly defined PNPG test can classify a SNP with an accuracy about 73% for the required parameters. The best performance among machine learning models is a feed-forward neural network with fivefold cross-validation accuracy of at least 73%. These results may provide valuable tools to solve the SNP classification problem, where tools are lacking, to assess the likelihood of disease causing in unclassified SNPs. These tests highlight the significance of hybridization chemistry in SNPs. They can be applied to further the effectiveness of research in the areas of genomics and metabolomics.

Publication Title

Molecular Genetics and Genomics