Electronic Theses and Dissertations





Document Type


Degree Name

Master of Science



Committee Chair

Mohammed Yeasin

Committee Member

Ramin Homayouni

Committee Member

Vasile Rus


The proliferation of "-omics" (such as, Genomics, Proteomics) and "-ology" (such as, System Biology, Cell Biology, Pharmacology) have spawned new frontiers of research in drug discovery and personalized medicine. A vast amount (21 million) of published research results are archived in the PubMed and are continually growing in size. To improve the accessibility and utility of such a large number of literatures, it is critical to develop a suit of semantic sensitive technology that is capable of discovering knowledge and can also infer possible new relationships based on statistical co-occurrences of meaningful terms or concepts. In this context, this thesis presents a unified framework to mine a large number of literatures through the integration of latent semantic analysis (LSA) and ontology mapping. In particular, a parameter optimized, robust, scalable, and distributed LSA (DiLSA) technique was designed and implemented on a carefully selected 7.4 million PubMed records related to pharmacology. The DiLSA model was integrated with MeSH to make the model effective and efficient for a specific domain. An optimized multi-gram dictionary was customized by mapping the MeSH to build the DiLSA model. A fully integrated web-based application, called PharmNet, was developed to bridge the gap between biological knowledge and clinical practices. Preliminary analysis using the PharmNet shows an improved performance over global LSA model. A limited expert evaluation was performed to validate the retrieved results and network with biological literatures. A thorough performance evaluation and validation of results is in progress.


Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.