Electronic Theses and Dissertations





Date of Award


Document Type


Degree Name

Doctor of Philosophy


Computer Science

Committee Chair

Vasile Rus

Committee Member

Lan Wang

Committee Member

Vinhthuy Phan

Committee Member

David Lin


The American healthcare system does not utilize a national patient identifier to locate medical information about an individual. Instead, they must rely on demographic searches, which are imprecise due to natural changes in attributes over time and common typographical variance. To clean up the erroneous duplicate records introduced by this process, many systems utilize simple string similarity techniques and the Fellegi-Sunter Probabilistic Theory of Record Linkage. Our work focuses on improving accuracy in patient record matching by leveraging modern information retrieval (IR) and natural language processing (NLP) techniques. First, we empirically demonstrate the importance of incorporating rich semantic parsing techniques and dependence relationships in the Fellegi-Sunter framework. Second, we explore grapheme-to-phoneme (G2P) translation using supervised machine learning methods. This approach allows us to build phonetic encoders that are optimized to increase recall in multicultural personal name queries. Lastly, we propose a method of generating synthetic patient demographic records using statistical profiles from real data. The lack of high-quality public datasets to use in benchmarking hinders innovation for the problem of demographic matching. Previous synthetic data generators produce datasets that are measurably different from real data in ways that over-simplify the matching problem. We suggest a simulation-based method using probabilistic graphical models and statistical disclosure control techniques. To quantify our results, we propose a number of measures to evaluate the data quality and complexity of semi-structured demographic attributes.


Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.