Distributed evolutionary approach to data clustering and modeling


In this article we describe a framework (DEGA-Gen) for the application of distributed genetic algorithms for detection of communities in networks. The framework proposes efficient ways of encoding the network in the chromosomes, greatly optimizing the memory use and computations, resulting in a scalable framework. Different objective functions may be used for producing division of network into communities. The framework is implemented using open source implementation of MapReduce paradigm, Hadoop. We validate the framework by developing community detection algorithm, which uses modularity as measure of the division. Result of the algorithm is the network, partitioned into non-overlapping communities, in such a way, that network modularity is maximized. We apply the algorithm to well-known data sets, such as Zachary Karate club, bottlenose Dolphins network, College football dataset, and US political books dataset. Framework shows comparable results in achieved modularity; however, much less space is used for network representation in memory. Further, the framework is scalable and can deal with large graphs as it was tested on a larger youtube.com dataset.

Publication Title

IEEE SSCI 2014 - 2014 IEEE Symposium Series on Computational Intelligence - CIDM 2014: 2014 IEEE Symposium on Computational Intelligence and Data Mining, Proceedings