Electronic Theses and Dissertations

Date

2026

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Computer Science

Committee Chair

Vasile Rus

Committee Member

Deepak Venugopal

Committee Member

Scott Fleming

Committee Member

Xiaofei Zhang

Abstract

The field of computer science faces a significant challenge in meeting the growing demand for skilled graduates, despite increasing interest in CS education. A persistent gap between supply and demand is partly attributed to high attrition rates of 30-40% (or higher) in introductory CS courses (CS1 and CS2). While various interventions have been proposed, the scalable assessment of students’ understanding of programming concepts remains a critical challenge, particularly in the context of code comprehension exercises. This dissertation addresses the challenge of automated assessment in code comprehension by developing and evaluating novel approaches for analyzing students’ self-explanations of program code. We focus on self-explanation-based Intelligent Tutoring Systems (ITSs), which prompt students to explain code line-by-line based on their knowledge and understanding. A key challenge in such systems is the scalable evaluation of student responses, which traditionally requires extensive expert-generated gold-standard answers for comparison and feedback generation. This manual approach becomes increasingly unsustainable as the number of programming exercises grows, creating a bottleneck in providing timely and effective feedback to students. To address these challenges, our research made several key contributions. First, we developed and published the SelfCode and SelfCode 2.0 corpora, containing 2,415 training and 604 test examples of student and expert line-by-line explanations for Java code. These datasets represent the first publicly available resources specifically designed for training and evaluating automated assessment methods in code comprehension contexts. Second, we explored multiple approaches for automated assessment, ranging from traditional machine learning to state-of-the-art transformers. Support Vector Regression combining features from SentenceBERT and DeepTutor, achieved a Pearson correlation of 0.7088 with human judgments. Fine-tuned transformer models, particularly SciBERT, reached correlations approaching 0.70, demonstrating that domain-specific pre-training improves assessment accuracy. Third, we validated an approach using Large Language Models (LLMs) to automatically generate gold-standard explanations. Systematic evaluation of GPT-4 showed that appropriately prompted LLMs produce explanations matching human expert quality in readability and semantic content, offering a scalable solution for creating reference answers. Building upon these contributions, we investigated instruction fine-tuning of open-source LLMs specifically for the assessment task. We evaluated four models: CodeGemma 7B, Mistral 7B, CodeLlama 7B, and Llama 3.1 8B. All models achieved significant correlations with human judgments (Pearson r = 0.652-0.681, p < 0.0001), with CodeGemma 7B performing best (r = 0.681). Critically, we found that 7B code-specialized models outperformed the 8B general-purpose Llama 3.1, demonstrating that targeted pre-training on code provides more value than additional parameters. This finding has important practical implications: smaller, specialized models can run on consumer-grade GPUs, making sophisticated assessment accessible to institutions with limited resources. This research develops a scalable and accurate approach to automated assessment in programming education that could help reduce attrition rates and increase the number of well-trained CS graduates. The datasets, methods, and findings contribute to building more effective Intelligent Tutoring Systems that provide immediate feedback while reducing instructor burden. The success of this work demonstrates that automated assessment of code comprehension through student self-explanations is achievable with reasonable accuracy using current technology, opening pathways for broader deployment of personalized learning support in computer science education

Comments

Data is provided by the student

Library Comment

Dissertation or thesis originally submitted to ProQuest/Clarivate.

Notes

Open Access

Share

COinS
 

Archival Statement

This item was created or digitized prior to April 24, 2026, or is a reproduction of legacy media created before that date. It is preserved in its original, unmodified state specifically for research, reference, or historical recordkeeping. This material is part of a digital archival collection and is not utilized for current University instruction, programs, or active public communication. In accordance with the ADA Title II Final Rule, the University Libraries provides accessible versions of archival materials upon request. To request an accommodation for this item, please submit an accessibility request form.