Effects of Design Factors of HDFS on I/O Performance
Journal of Computer Science
Four major design factors of HDFS, the block size, the number of data nodes, the number of client processes and replication factor are investigated to find out the effects on the I/O performance of HDFS by performing experiments in a real physical HDFS infrastructure consisting of 64 Hadoop data nodes of Intel i9 based blades. The block size is observed to be optimal when it equals to about 1Gb or 128MB that is the amount of the data the hard disk drive device can effectively input and output for 1 second in most of today’s off-the-shelf computers. Sophisticated allocation strategy is required to determine the number of mappers and reducers as the number of data nodes increase because the overall performance is influenced in complicated manner by the number of raw data blocks of the job to be processed, the processing time of the blocks for each node and the overhead of shuffling. Experiments shows that Hadoop distributes the work properly that the number of clients does not have a significant impact as the number of clients increases. There is little delay in copying the replica because replication is done in pipelined manner although the network is overloaded.
© 2018 Han-Gyoo Kim. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.