Electronic Theses and Dissertations Archive

Author

Date

2026

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Computer Science

Committee Chair

Deepak Venugopal

Committee Member

Madhusudhanan Balasubramanian

Committee Member

Vasile Rus

Committee Member

Xiaofei Zhang

Abstract

Multimodal AI systems integrate computer vision, natural language processing, and knowledge representation. While deep learning has made immense advances in tasks such as Visual Captioning (VC) and Visual Question Answering (VQA), it is hard to decipher knowledge encoded within these models to verify, evaluate and explain the behavior of these models. In this dissertation, we propose to i) develop a probabilistic framework to evaluate uncertainty in captioning models using Markov Logic Networks (MLNs), a well-known statistical relational model ii) disentangle knowledge grained in fine-tuning from preexisting knowledge encoded in pre-trained captioning models using a Neuro-Symbolic extension of MLNs called Hybrid Markov Logic Networks and iii) understand the sensitivity and limitations of Vision Large Language Models (VLMs) in VQA when processing modifications to questions that are cognitively more demanding to process. In summary, our dissertation advances understanding and evaluation of multimodal AI systems.

Comments

Data is provided by the student

Library Comment

Dissertation or thesis originally submitted to ProQuest/Clarivate.

Notes

Open Access

Share

COinS