Faculty Publications

Mining protein sequences for motifs

Giri Narasimhan, University of Memphis
Changsong Bu, Idax, Inc.
Yuan Gao, IBM Thomas J. Watson Research Center
Xuning Wang, Pfizer Inc.
Ning Xu, University of Memphis
Kalai Mathee, Florida International University

Abstract

We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence.

Publication Title

Journal of Computational Biology

Recommended Citation

Narasimhan, G., Bu, C., Gao, Y., Wang, X., Xu, N., & Mathee, K. (2002). Mining protein sequences for motifs. Journal of Computational Biology, 9 (5), 707-720. https://doi.org/10.1089/106652702761034145

Link to Full Text

COinS

Faculty Publications

Mining protein sequences for motifs

Abstract

Publication Title

Recommended Citation

Search

Browse

Author Corner

Libraries

Faculty Publications

Mining protein sequences for motifs

Authors

Abstract

Publication Title

Recommended Citation

Share

Search

Browse

Author Corner

Libraries