A Reformed K-Nearest Neighbors Algorithm for Big Data Sets

Vo Ngoc Phu; Vo Thi Ngoc Tran

doi:10.3844/jcssp.2018.1213.1225

Research Article Open Access

A Reformed K-Nearest Neighbors Algorithm for Big Data Sets

Vo Ngoc Phu¹ and Vo Thi Ngoc Tran²

¹ Nguyen Tat Thanh University, Vietnam
² Vietnam National University, Vietnam

Abstract

A Data Mining Has Already Had Many Algorithms Which A K-Nearest Neighbors Algorithm, K-NN, Is A Famous Algorithm For Researchers. K-NN Is Very Effective On Small Data Sets, However It Takes A Lot Of Time To Run On Big Datasets. Today, Data Sets Often Have Millions Of Data Records, Hence, It Is Difficult To Implement K-NN On Big Data. In This Research, We Propose An Improvement To K-NN To Process Big Datasets In A Shortened Execution Time. The Reformed K-Nearest Neighbors Algorithm (R-K-NN) Can Be Implemented On Large Datasets With Millions Or Even Billions Of Data Records. R-K-NN Is Tested On A Data Set With 500,000 Records. The Execution Time Of R-K-NN Is Much Shorter Than That Of K-NN. In Addition, R-K-NN Is Implemented In A Parallel Network System With Hadoop Map (M) And Hadoop Reduce (R).

Journal of Computer Science

Volume 14 No. 9, 2018, 1213-1225

DOI: https://doi.org/10.3844/jcssp.2018.1213.1225

Submitted On: 1 February 2018 Published On: 8 March 2018

How to Cite: Phu, V. N. & Ngoc Tran, V. T. (2018). A Reformed K-Nearest Neighbors Algorithm for Big Data Sets. Journal of Computer Science, 14(9), 1213-1225. https://doi.org/10.3844/jcssp.2018.1213.1225

Copyright: © 2018 Vo Ngoc Phu and Vo Thi Ngoc Tran. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

6,436 Views
3,525 Downloads
4 Citations

Download

Keywords

K-Nearest Neighbors Algorithm
K-NN
Parallel Network Environment
Distributed System
Data Mining
Association Rules
Cloudera
Hadoop Map
Hadoop Reduce