A Reformed K-Nearest Neighbors Algorithm for Big Data Sets
Vo Ngoc Phu and Vo Thi Ngoc Tran
Journal of Computer Science
A Data Mining Has Already Had Many Algorithms Which A K-Nearest Neighbors Algorithm, K-NN, Is A Famous Algorithm For Researchers. K-NN Is Very Effective On Small Data Sets, However It Takes A Lot Of Time To Run On Big Datasets. Today, Data Sets Often Have Millions Of Data Records, Hence, It Is Difficult To Implement K-NN On Big Data. In This Research, We Propose An Improvement To K-NN To Process Big Datasets In A Shortened Execution Time. The Reformed K-Nearest Neighbors Algorithm (R-K-NN) Can Be Implemented On Large Datasets With Millions Or Even Billions Of Data Records. R-K-NN Is Tested On A Data Set With 500,000 Records. The Execution Time Of R-K-NN Is Much Shorter Than That Of K-NN. In Addition, R-K-NN Is Implemented In A Parallel Network System With Hadoop Map (M) And Hadoop Reduce (R).
© 2018 Vo Ngoc Phu and Vo Thi Ngoc Tran. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.