Research Article Open Access

Evaluating the Efficiency of CPUs, GPUs and FPGAs on a Near-Duplicate Document Detection Via OpenCL

Ercan Canhasi1
  • 1 Gjirafa, Inc. Rr. Rexhep Mala, 28A, Kosovo

Abstract

Discovering identical or near-identical items is urgently important in many applications such as Web crawling since it drastically reduces the text processing costs. Simhash is a widely used technique, able to attribute a bit-string identity to a text, such that similar texts have similar identities. In this study, a real-time solution for a simhash calculation in OpenCL is presented. We also show how it can be utilized by multi-CPUs, GPUs and FPGAs. As a result we indicate that the bottom line computation realized on the FPGA through OpenCL provides significant power advantages.

Journal of Computer Science
Volume 14 No. 5, 2018, 699-704

DOI: https://doi.org/10.3844/jcssp.2018.699.704

Submitted On: 8 June 2017 Published On: 28 April 2018

How to Cite: Canhasi, E. (2018). Evaluating the Efficiency of CPUs, GPUs and FPGAs on a Near-Duplicate Document Detection Via OpenCL. Journal of Computer Science, 14(5), 699-704. https://doi.org/10.3844/jcssp.2018.699.704

  • 3,009 Views
  • 1,700 Downloads
  • 0 Citations

Download

Keywords

  • Simhash
  • OpenCL
  • CPU
  • GPU
  • FPGA
  • Xilinx
  • SDAccel