Electronic Theses and Dissertations Archive
Date
2026
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Civil Engineering
Committee Chair
Stephanie Ivey
Committee Member
Aaron Robinson
Committee Member
Antipova Anzhelika
Committee Member
Armstrong Aboah
Committee Member
Martin Lipinski
Abstract
Distracted driving remains a critical road safety challenge, as existing detection methods often rely on single-modality inputs that lack the contextual nuance to distinguish hazardous behavior from normal driving. This research program addresses these limitations through a progressive three-study framework that advances context-aware detection from systematic evidence synthesis to multimodal foundation model adaptation, reframing detection as a representational alignment problem. Study One systematically reviewed 77 peer-reviewed papers (2019–2025) across visual, sensor-based, and emerging modalities. The results confirmed a dominance of visual-only approaches and identified critical gaps in environmental robustness, cognitive distraction detection, and contextual grounding, establishing the empirical necessity for multimodal integration. Study Two evaluated the performance impact of incorporating road-facing context alongside driver-facing video using three spatiotemporal architectures. Findings revealed that contextual benefits are highly architecture-dependent; for instance, while SlowOnly-R50 saw a 4.9% accuracy gain, SlowFast-R50 declined by 7.2%. This indicates that effective context integration requires fusion-aware design rather than naive input concatenation. Study Three reframed detection as a representational alignment problem using ImageBind to synchronize vision, audio, and inertial measurement unit (IMU) signals within a shared embedding space. Utilizing parameter-efficient Low-Rank Adaptation (LoRA) and supervised contrastive learning across twelve fusion configurations, the study identified a clear modality hierarchy led by vision. The optimized model achieved a Macro-F1 of 0.821, a 2.9 percentage point improvement over the frozen unimodal baseline. These gains were most significant in visually ambiguous scenarios where acoustic or inertial context provided critical disambiguating evidence, confirming that adaptation enhances cross-modal compatibility rather than individual encoder strength. Collectively, these findings demonstrate that effective distracted driving detection depends on representation quality rather than architectural complexity. This work provides an empirically grounded roadmap for modality selection, fusion design, and parameter-efficient adaptation in the next generation of intelligent driver monitoring systems.
Library Comment
Dissertation or thesis originally submitted to ProQuest/Clarivate.
Notes
Open Access.
Recommended Citation
Dontoh, Anthony, "Advancing Multimodal Distracted Driving Detection: From Systematic Synthesis to Representation Alignment in a Unified Embedding Space" (2026). Electronic Theses and Dissertations Archive. 3974.
https://digitalcommons.memphis.edu/etd/3974
Archival Statement
This item was created or digitized prior to April 24, 2027, or is a reproduction of legacy media created before that date. It is preserved in its original, unmodified state specifically for research, reference, or historical recordkeeping. This material is part of a digital archival collection and is not utilized for current University instruction, programs, or active public communication. In accordance with the ADA Title II Final Rule, the University Libraries provides accessible versions of archival materials upon request. To request an accommodation for this item, please submit an accessibility request form.
Comments
Data is provided by the student.