Electronic Theses and Dissertations

Author

Liang Zhang

Date

2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Electrical & Computer Engineering

Committee Chair

Mohammed Yeasin

Committee Member

Faruk Ahmed

Committee Member

Mohammadreza Davoodi

Committee Member

Philip I. Pavlik Jr.

Committee Member

Xiangen Hu

Abstract

Intelligent Tutoring Systems (ITSs) leverage computer-aided technologies to provide dynamically adaptive instruction tailored to individual learners. Recent advancements in Artificial Intelligence (AI) have further enhanced their capacity to deliver responsive, personalized learning experiences. As learners interact with ITSs by responding to questions, their performance data, such as correct and incorrect answers, become essential for assessing and predicting latent knowledge states through analysis and modeling. However, real-world learner data frequently suffer from sparsity due to incomplete interactions, unanswered questions, and missed attempts. Such missing information poses significant challenges, undermining the accuracy of learner modeling and the effectiveness of personalized instruction. To systematically address these challenges, this dissertation presents comprehensive studies focusing on data augmentation and imputation techniques designed to mitigate data sparsity issues in learner performance data from ITSs. Specifically, three interconnected studies are conducted: (1) a generative data augmentation framework to enrich sparse datasets while preserving essential learning attributes, (2) a tensor-based Generative Adversarial Imputation Network (GAIN) to robustly impute missing data by capturing multidimensional learner interactions, and (3) introducing a global-to-local fine-tuning strategy integrating Bayesian Knowledge Tracing (BKT) derived learner similarity features to enhance the personalization, accuracy, and effectiveness of GAIN-based imputation. These studies systematically explore how advanced generative AI methods can significantly address the sparsity of learner performance data from ITS environments. Initially, we introduce a generative data augmentation framework that employs a three-dimensional (3D) tensor representation to capture learner-question-attempt interactions, integrating tensor factorization for data imputation. Generative Artificial Intelligence (GenAI) models, specifically Vanilla Generative Adversarial Networks (GAN) and Generative Pretrained Transformers (GPT-4o), are leveraged to generate synthetic data tailored to individual clusters of learner performance data across varying sample sizes. These generative techniques effectively enhance sparse datasets without distorting core learner characteristics, validated by rigorous evaluations involving Earth Mover’s Distance (EMD) and bimodality coefficients. Experiments conducted using the AutoTutor Adult Reading Comprehension (ARC) dataset demonstrated the framework's efficacy in realistically expanding the learner performance data, thereby supporting more accurate learner modeling and analysis. Subsequently, a novel tensor-based GAIN is proposed, also utilizing a 3D tensor to represent learner performance data across learners, questions, and attempts. Within this 3-D tensor structure, each learner's performance data are organized as a two-dimensional ``learner image'' matrix, mapping questions against attempts across the learner dimension. The GAIN model adopts a GAN-based architecture in which a generator imputes plausible values for missing data, guided by a discriminator that balances observed and predicted data, and further enhanced by a hint mechanism that iteratively refines imputations to accurately reflect authentic learner performance patterns. Experiments using real-world datasets including AutoTutor Adult Reading Comprehension (ARC), ASSISTments, and MATHia, demonstrate significant improvements in imputation accuracy, robustness, and cognitive interpretability compared to traditional methods such as tensor factorization and the GAN variants. Further validation via Bayesian Knowledge Tracing (BKT) confirms that critical learning parameters (e.g., initial learning ability, learning rate, guess rate, and slip rate) are preserved in the imputed data, thus maintaining cognitive fidelity relative to the original sparse data. Lastly, the dissertation culminates in the proposal of an innovative global-to-local fine-tuning strategy for the GAIN-based imputation model. This strategy integrates global pretraining with a similarity-based local fine-tuning process, leveraging BKT-derived cognitive parameters to personalize data imputation for individual learner subgroups. Experimental validation using MATHia datasets demonstrates that this refined approach achieves superior imputation accuracy and stability under diverse sparsity conditions ranging from 10\% to 90\% (with sparsity intentionally created through simulations derived from comprehensive datasets while retaining ground-truth data for validation). The global-to-local fine-tuning strategy effectively aligns imputed data with individualized cognitive trajectories, substantially enhancing the accuracy of imputations by embedding the semantic understanding of learner-specific characteristics into the GAIN-based imputation model. The contributions of this dissertation significantly advance ITSs by introducing robust, innovative methodologies tailored to address data sparsity in learner performance data. These methodologies facilitate the precision of learner modeling and substantially improve the effectiveness of adaptive instructional interventions by closely aligning instructional strategies with learners' cognitive and performance characteristics. Moreover, the scalable and reliable solutions developed herein provide comprehensive, contextually coherent datasets, significantly augmenting the practical utility and effectiveness of ITS environments, thereby promoting educational equity and efficacy through AI-driven personalized learning experiences.

Comments

Data is provided by the student

Library Comment

Dissertation or thesis originally submitted to ProQuest.

Notes

Open Access

Share

COinS