Electronic Theses and Dissertations

Identifier

6007

Date

2017

Document Type

Thesis

Degree Name

Master of Science

Major

Computer Science

Committee Chair

Dipankar Dasgupta

Committee Member

Deepak Venugopal

Committee Member

Fatih Sen

Abstract

Big data analytics is being used more widely every day for a variety of applications. These new methods of applying analytics certainly bring innovative improvements in various fields. To process Big data and obtain faster, secure and accurate results is a challenging task. Hadoop and Spark are two technologies which deal with large amounts of data in a distributed environment using parallel computing. Hadoop and Spark use Map-Reduce technique to process large datasets. The iterative processing capability of Hadoop affects the processing of the data. Spark uses in-memory cluster computing/data storage to enhance the performance for different datasets. A series of experiments were conducted on both Hadoop and Spark with different datasets. To analyze the performance variation in both the frameworks, a comparative analysis was performed from the results obtained by using Hadoop and Spark. An experiment based on financial data (NASDAQ Total view- ITCH) was performed in the Hadoop environment to analyze stock data and its variations.

Comments

Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.

Share

COinS