Doctor of Philosophy
The American healthcare system does not utilize a national patient identifier to locate medical information about an individual. Instead, they must rely on demographic searches, which are imprecise due to natural changes in attributes over time and common typographical variance. To clean up the erroneous duplicate records introduced by this process, many systems utilize simple string similarity techniques and the Fellegi-Sunter Probabilistic Theory of Record Linkage. Our work focuses on improving accuracy in patient record matching by leveraging modern information retrieval (IR) and natural language processing (NLP) techniques. First, we empirically demonstrate the importance of incorporating rich semantic parsing techniques and dependence relationships in the Fellegi-Sunter framework. Second, we explore grapheme-to-phoneme (G2P) translation using supervised machine learning methods. This approach allows us to build phonetic encoders that are optimized to increase recall in multicultural personal name queries. Lastly, we propose a method of generating synthetic patient demographic records using statistical profiles from real data. The lack of high-quality public datasets to use in benchmarking hinders innovation for the problem of demographic matching. Previous synthetic data generators produce datasets that are measurably different from real data in ways that over-simplify the matching problem. We suggest a simulation-based method using probabilistic graphical models and statistical disclosure control techniques. To quantify our results, we propose a number of measures to evaluate the data quality and complexity of semi-structured demographic attributes.
Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.
Ash, Stephen Michael, "Improving Accuracy of Patient Demographic Matching and Identity Resolution" (2017). Electronic Theses and Dissertations. 1586.