Ancestry Analysis in Population-Scale Genomic Data

Authors

  • Haseeb Zahid Group M, Karachi, Pakistan.

Keywords:

Principal Component Analysis, Polymorphisms, SNPs, Ancestry analysis

Abstract

Significant distinctions exist among ethnic groups, encompassing variations in traits such as height, eye color, skin tone, susceptibility to certain illnesses, and responses to specific medications. However, there has been insufficient exploration into the genetic foundations of these differences. The Human Genome Diversity Project has amassed extensive genotypic data from Asian populations. Although Principal Component Analysis (PCA) can aid in discerning disparities among populations, it overlooks variations in individual Single Nucleotide Polymorphisms (SNPs) between populations. Thus, alternative statistical methodologies, such as the "mutual information algorithm," prove valuable in identifying SNPs associated with specific ethnicities and quantifying the discrepancies in SNPs within the Pakistani population. This study endeavors to uncover SNP variations among various ethnic groups in Pakistan. Employing the mutual information algorithm, we statistically compare each SNP across diverse ethnicities within our sample. Subsequently, we construct a classifier capable of determining an individual's ethnicity based on their genetic data, likely through techniques like feature engineering or dimensionality reduction. To assess the classifier's accuracy, we utilize a separate test dataset. The results indicate a 40% success rate in accurately predicting an individual's ethnicity within the test dataset.

 

Downloads

Published

2024-04-08

How to Cite

Haseeb Zahid. (2024). Ancestry Analysis in Population-Scale Genomic Data. PAKISTAN JOURNAL OF BIOCHEMISTRY AND MOLECULAR BIOLOGY, 56(3), 95–109. Retrieved from https://www.pjbmb.com/index.php/pjbmb/article/view/100