Electronic Theses and Dissertations



Document Type


Degree Name

Doctor of Philosophy


Computer Science

Committee Chair

Vasile Rus

Committee Member

Scott Fleming

Committee Member

Deepak Venugopal

Committee Member

Xiaofei Zhang


Studies have shown that 30–40% of students fail or drop out of the introductory programming course. Furthermore, enrollment in CS programs has lately shown a significant increase, making it difficult to provide personalized attention. In this context, the work in this dissertation is an effort toward developing Self-Explanation Based Intelligent Tutoring System (ITS) for Code Comprehension, which offers one-to-one tutoring to enhance learners' source code comprehension skills. Self-Explanation Based ITS uses self-explanation as a learning strategy. Although self-explanation has shown a positive effect in different science domains such as biology, and math, it has been studied minimally for code comprehension; it seeks further investigations. Likewise, it uses questions as hints to scaffold students to elicit the code explanation correctly. Currently, such questions are authored manually and thus costly. Therefore, this dissertation aims to examine the effectiveness of self-explanation for code comprehension and explore the approaches for automatically generating questions for code comprehension. We conducted two randomized trial experiments and developed two approaches for this purpose. Our first study investigated the effect of merely prompting to freely self-explain code. We found it helps to induce 31% learning gain. Then, the second study compares guided self-explanation with free self-explanation. The result shows that guided self-explanation outperformed by 29%, inducing students' learning gain. Next, we developed two systems to generate questions using each sentence in code explanation automatically. Our evaluation shows that generated questions are linguistically well-formed, pedagogically sound, and indistinguishable from human-generated questions. Finally, we formed a CodeQG dataset specific for code comprehension and trained a transformer in the dataset to automatically generate questions using target concepts in code explanation. Our finding shows that the model not only generated a wide variety of impressive questions (BLEU:89, ROUGE: 94, F1:94.64), but the model's performance improved almost triple by training on CodeQG compared to using SQuAD. In this dissertation, we investigated and showed that Self-Explanation is effective for code comprehension. Then, we developed approaches for automatically generating questions for code comprehension. We also constructed a large dataset called CodeQG, a question-generation dataset specific to code comprehension.


Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to ProQuest


Embargoed until 4/7/2024

Available for download on Sunday, April 07, 2024