Improving Map Reduce Performance in Heterogeneous Distributed System using HDFS Environment-A Review

Shraddha Thakkar, Prof. Sanjay Patel

doi:10.17762/ijritcc.v3i3.3934

PDF

Published: Mar 31, 2015

DOI: https://doi.org/10.17762/ijritcc.v3i3.3934

Shraddha Thakkar, Prof. Sanjay Patel

Abstract

Hadoop is a Java-based programming framework which supports for storing and processing big data in a distributed computing environment. It is using HDFS for data storing and using Map Reduce to processing that data. Map Reduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Map Reduce is widely used for short jobs requiring low response time. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in nature. Unfortunately, both the homogeneity and data locality assumptions are not satisfied in virtualized data centers. Hadoop’s scheduler can cause severe performance degradation in heterogeneous environments. We observe that, Longest Approximate Time to End (LATE), which is highly robust to heterogeneity. LATE can improve Hadoop response times by a factor of 2 in clusters.
DOI: 10.17762/ijritcc2321-8169.150301

How to Cite

, S. T. P. S. P. (2015). Improving Map Reduce Performance in Heterogeneous Distributed System using HDFS Environment-A Review. International Journal on Recent and Innovation Trends in Computing and Communication, 3(3), 903–910. https://doi.org/10.17762/ijritcc.v3i3.3934