Technical Report Open Access

Effects of Design Factors of HDFS on I/O Performance

Han-Gyoo Kim1
  • 1 Hongik University Seoul, Korea
Journal of Computer Science
Volume 14 No. 3, 2018, 304-309

DOI: https://doi.org/10.3844/jcssp.2018.304.309

Submitted On: 11 January 2018 Published On: 13 March 2018

How to Cite: Kim, H. (2018). Effects of Design Factors of HDFS on I/O Performance. Journal of Computer Science, 14(3), 304-309. https://doi.org/10.3844/jcssp.2018.304.309

Abstract

Four major design factors of HDFS, the block size, the number of data nodes, the number of client processes and replication factor are investigated to find out the effects on the I/O performance of HDFS by performing experiments in a real physical HDFS infrastructure consisting of 64 Hadoop data nodes of Intel i9 based blades. The block size is observed to be optimal when it equals to about 1Gb or 128MB that is the amount of the data the hard disk drive device can effectively input and output for 1 second in most of today’s off-the-shelf computers. Sophisticated allocation strategy is required to determine the number of mappers and reducers as the number of data nodes increase because the overall performance is influenced in complicated manner by the number of raw data blocks of the job to be processed, the processing time of the blocks for each node and the overhead of shuffling. Experiments shows that Hadoop distributes the work properly that the number of clients does not have a significant impact as the number of clients increases. There is little delay in copying the replica because replication is done in pipelined manner although the network is overloaded.

  • 1,124 Views
  • 1,396 Downloads
  • 2 Citations

Download

Keywords

  • Network Integrated Storage
  • Big Data
  • Cloud Storage
  • Scalable Storage
  • Huge Scale I/O