Electronic Theses and Dissertations
Identifier
6007
Date
2017
Document Type
Thesis
Degree Name
Master of Science
Major
Computer Science
Committee Chair
Dipankar Dasgupta
Committee Member
Deepak Venugopal
Committee Member
Fatih Sen
Abstract
Big data analytics is being used more widely every day for a variety of applications. These new methods of applying analytics certainly bring innovative improvements in various fields. To process Big data and obtain faster, secure and accurate results is a challenging task. Hadoop and Spark are two technologies which deal with large amounts of data in a distributed environment using parallel computing. Hadoop and Spark use Map-Reduce technique to process large datasets. The iterative processing capability of Hadoop affects the processing of the data. Spark uses in-memory cluster computing/data storage to enhance the performance for different datasets. A series of experiments were conducted on both Hadoop and Spark with different datasets. To analyze the performance variation in both the frameworks, a comparative analysis was performed from the results obtained by using Hadoop and Spark. An experiment based on financial data (NASDAQ Total view- ITCH) was performed in the Hadoop environment to analyze stock data and its variations.
Library Comment
Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.
Recommended Citation
Murthy, Adithya K., "Big Data Analysis Using Hadoop and Spark" (2017). Electronic Theses and Dissertations. 1700.
https://digitalcommons.memphis.edu/etd/1700
Comments
Data is provided by the student.