Evaluating Captioning Models using Markov Logic Networks
Abstract
Multimodal problems such as caption generation advances AI as a whole since they require integration of several key domains such as computer vision, NLP and knowledge representation. In this paper, we develop a new approach to evaluate captioning models by verifying them using Markov Logic Networks (MLNs). Specifically, we compile an MLN from training data and perform probabilistic inference to estimate uncertainty in a generated caption. To reify the caption, we leverage advances in Natural Language Inference (NLI) models and convert a caption into a query for the MLN. Further, we add visual context into the MLN distribution using an attention-based Multiple Instance Learning model and evaluate a caption based on this augmented distribution. We perform experiments using MSCOCO on several state-of-the-art benchmarks and show that our approach can evaluate captioning models just as effectively as methods that require human-generated captions.
Publication Title
Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
Recommended Citation
Shah, M., Sarkhel, S., & Venugopal, D. (2022). Evaluating Captioning Models using Markov Logic Networks. Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022, 127-134. https://doi.org/10.1109/BigData55660.2022.10020793