Electronic Theses and Dissertations

Author

Ryan Wickman

Date

2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Computer Science

Committee Chair

Xiaofei Zhang

Committee Member

Vasile Rus

Committee Member

Xiaolei Huang

Committee Member

Deepak Venugopal

Abstract

A prevalent limitation of optimizing over a single objective is that it can be misguided, becoming trapped in local optimum. Quality-Diversity (QD) algorithms overcome this limitation by seeking a population of high-quality and diverse solutions to a problem. Most conventional QD approaches, such as MAP-Elites, rely on a behavior archive where solutions are categorized into predefined niches. While promising, these approaches require formulating assumptions on the set of behaviors, metrics for defining the distance between behaviors, and at many times constraining the learned behaviors into a fixed set of bins. In this work, we begin by proposing an alternative to archive-based QD algorithms called Diverse Quality Species (DQS), which breaks solutions down into independently evolving species and employs unsupervised skill discovery to learn diverse, high-performing solutions without the need for an archive or predefined ranges of behaviors. We achieve this using gradient-based mutations that jointly maximize a mutual information objective and reward. We evaluate DQS on several simulated robot environments and demonstrate that it can learn a diverse set of solutions from varying species. Furthermore, we propose a novel unsupervised skill discovery algorithm called SSD that which trains a speciated population of skill-conditioned policies. SSD maximizes the mutual information between states and skills and between state and species-given skills. To achieve this, we employ a contrastive learning framework to minimize the conditional entropy between states and skill-species pairs, facilitating the learning of controllable latent behaviors. Moreover, we utilize a particle-based entropy estimator to maximize state entropy, thereby promoting state space exploration. Lastly, we combine DQS with SSD and show how it uncovers a range of innovative latent behavior, surpassing the performance of prior methods.

Comments

Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to ProQuest.

Notes

Open Access

Share

COinS