Electronic Theses and Dissertations

Identifier

4821

Date

2016

Document Type

Dissertation (Access Restricted)

Degree Name

Doctor of Philosophy

Major

Computer Science

Committee Chair

Dipankar Dasgupta

Committee Member

Vasile Rus

Committee Member

Lan Wang

Committee Member

Zhuo Lu

Abstract

Many tools and techniques have been developed to analyze big collections of data. The increased use of cyber-enabled systems, such as Internet-of-Things (IoT) and sensors, are generating a massive amount of data with different structures. Most of the new big data solutions are built on top of Hadoop eco-system, or at least use its distributed file system (HDFS). However, studies have shown inefficiency in such systems in dealing with modern data. Although some research overcame these problems for specific types of graph data, modern data are more than one type. Such efficiency issues lead to larger-scale problems such as larger datacenters space and waste in resource, like networks usage and power consumption, which in turn leads to environmental problems. This dissertation proposes a data-aware packaging for the Hadoop eco-system and its distributed file system. Such a framework allows Hadoop to manage the distribution and the placement of data based on cluster analysis of the data itself. Unlike previous efforts, I was able to handle a broader range of data types, optimizing a wider range of processes as well as query time and resource usage.

Comments

Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.

Share

COinS