Date of Award
Dissertation (Access Restricted)
Doctor of Philosophy
Many tools and techniques have been developed to analyze big collections of data. The increased use of cyber-enabled systems, such as Internet-of-Things (IoT) and sensors, are generating a massive amount of data with different structures. Most of the new big data solutions are built on top of Hadoop eco-system, or at least use its distributed file system (HDFS). However, studies have shown inefficiency in such systems in dealing with modern data. Although some research overcame these problems for specific types of graph data, modern data are more than one type. Such efficiency issues lead to larger-scale problems such as larger datacenters space and waste in resource, like networks usage and power consumption, which in turn leads to environmental problems. This dissertation proposes a data-aware packaging for the Hadoop eco-system and its distributed file system. Such a framework allows Hadoop to manage the distribution and the placement of data based on cluster analysis of the data itself. Unlike previous efforts, I was able to handle a broader range of data types, optimizing a wider range of processes as well as query time and resource usage.
Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.
Hajeer, Mustafa Hussein, "Handling Big Data With A Data-Aware HDFS Using Evolutionary Clustering Technique" (2016). Electronic Theses and Dissertations. 2254.