Motif discovery in upstream sequences of coordinately expressed genes


The paper presents a genetic mining approach to discover highly conserved motifs amongst upstream sequences of co-regulated genes. These motifs represent putative cis-regulatory elements that could play an important role in the co-ordinated expression of these genes. A structured genetic algorithm (St-GA) was used to evolve candidate motifs of variable length. Fitness values were assigned as a function of high scoring alignments performed with NCBI BLAST. The St-GA performed favorable with respect to existing methods on simple (l,k) insertion problems, but was unable to overcome the (l,4) insertion problem that has proved elusive to other methods. Deterministic crowding was added to the St-GA to help cope with the multimodal nature of real-world genomic data. The genetic search was performed on a set of genes selected based on their expression values as highly predictive of a subtype of pediatric ALL. Four high scoring motifs were obtained that successfully matched subsequences of cis-elements found in the TRANSFAC database. Results demonstrated that the St-GA approach to motif finding has the potential to be a competitive method for this type of problem. © 2003 IEEE.

Publication Title

2003 Congress on Evolutionary Computation, CEC 2003 - Proceedings