High Performance Fault-Tolerant Hadoop Distributed File System

Yelakala Pragna

doi:10.17762/ijritcc.v5i5.670

PDF

Published: May 31, 2017

DOI: https://doi.org/10.17762/ijritcc.v5i5.670

Yelakala Pragna

Abstract

The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. Huge amounts of data generated from many sources daily. Maintenance of such data is a challenging task. One proposing solution is to use Hadoop. The solution provided by Google, ?Doug Cutting? and his team developed an Open Source Project called Hadoop. Hadoop is a framework written in Java for running applications on large clusters of commodity hardware. The Hadoop Distributed File System (HDFS) is designed to be scalable, fault-tolerant, distributed storage system. Hadoop?s HDFS is a highly fault-tolerant distributed file system and, like Hadoop in general, designed to be deployed on low-cost hardware. The HDFS stores filesystem Metadata and application data separately. HDFS stores Metadata on separate dedicated server called NameNode and application data stored on separate servers called DataNodes. The file system data is accessed via HDFS clients, which first contact the NameNode data location and then transfer data to (write) or from (read) the specified DataNodes. Download file request chooses only one of the servers to download. Other replicated servers are not used. As the file size increases the download time increases. In this paper we work on three policies for selection of blocks. Those are first, random and loadbased. By observing the results the speed of download time for file is ?first? runs slower than ?random? and ?random? runs slower than ?loadbased?.

How to Cite

, Y. P. (2017). High Performance Fault-Tolerant Hadoop Distributed File System. International Journal on Recent and Innovation Trends in Computing and Communication, 5(5), 1137–1145. https://doi.org/10.17762/ijritcc.v5i5.670