Electronic Theses and Dissertations

Author

Rabin Banjade

Date

2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Computer Science

Committee Chair

Vasile Rus

Committee Member

Vasile Rus

Committee Member

Scott Fleming

Committee Member

Andrew M. Olney

Committee Member

Deepak Venugopal

Abstract

This dissertation addresses the difficulties of creating domain models for Intelligent Tutoring Systems (ITS) in computer programming. One significant challenge of designing ITS for computer programming involves determining the essential knowledge components a student must learn to become proficient in programming. One possible solution is to engage programming experts, which is impractical due to scalability and cost constraints. Existing approaches for automatically extracting domain models for computer programming focus on parsing code examples, but their scope is limited. While they can extract some domain terms, they do not adequately address the pedagogical requirements. We address the challenges of automatically extracting domain models for computer programming using various approaches. We experimented with and evaluated various statistical, graph-based, and transformer-based approaches to extract key concepts from textbooks. Extracting domain models using this over-generation and ranking poses challenges such as concept generality and specificity. We propose effective solutions to alleviate such challenges using the modified maximal marginal relevance method. Also, we evaluated textbook indexes as a source of the domain model. We demonstrate that utilizing textbook indexes for extracting domain models poses challenges and is not a straightforward process. To overcome these challenges, we propose string-matching algorithms based on lexical and semantic similarity. Similarly, we introduce a fine-tuned version of the BERT model that enhances the relatedness of knowledge components, thereby improving the extraction of domain terms based on embeddings. Additionally, we conduct experiments with Large Language Models (LLMs) to extract domain terms, providing insights into their efficacy and the challenges associated with extracting domain terms using LLMs. Furthermore, we also use LLMs under various prompt settings to classify the relationships between knowledge components and propose approaches to extract a hierarchical domain model for computer programming. We highlight the challenges of using LLMs for such tasks. We conclude by identifying some of the limitations of the proposed methods, areas of further improvement, and broader future directions for the successful adoption of our approaches in automatically extracting domain models for ITS development.

Comments

Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to ProQuest.

Notes

Embargoed until 07-16-2025

Available for download on Wednesday, July 16, 2025

Share

COinS