Review Article Open Access

A Wide Scale Classification of Class Imbalance Problem and its Solutions: A Systematic Literature Review

Gillala Rekha1, Amit Kumar Tyagi2 and V. Krishna Reddy1
  • 1 Koneru Lakshmaiah Education Foundation, India
  • 2 Vellore Institute of Technology, India
Journal of Computer Science
Volume 15 No. 7, 2019, 886-929

DOI: https://doi.org/10.3844/jcssp.2019.886.929

Submitted On: 18 November 2018 Published On: 3 July 2019

How to Cite: Rekha, G., Tyagi, A. K. & Reddy, V. K. (2019). A Wide Scale Classification of Class Imbalance Problem and its Solutions: A Systematic Literature Review. Journal of Computer Science, 15(7), 886-929. https://doi.org/10.3844/jcssp.2019.886.929

Abstract

In today’s world, most of the data (real world) is present in imbalanced form by nature. This is because of not having efficient algorithms to put this data (i.e., generated data by billion of internet- connected devices (IoTs)) in respective format. Imbalanced data poses a great challenge to (both) data mining and machine learning algorithms. The imbalanced dataset consists of a majority class and a minority class, where the majority class takes the lead over the minority class. Generally, several standard learning algorithms assume the balanced class distribution or equal misclassification costs. If prediction is performed by these learning algorithms on imbalanced data, the accuracy will be high for majority classes, i.e., resulting in poor performance. To overcome this problem (or improving accuracy of deision/prediction-making process), data mining and machine learning researchers have addressed the problem of imbalanced data using data-level, algorithmic level and ensemble or hybrid methods. This article presents a systematic literature review and analyze the results of more than 400 research papers published between 2002-2017 (till June 2017), resulting in a broader and elaborate investigation of the literature in this area of research. Note that extension of this article/work will contain till December 2018 research articles, which will be published in June 2019 (now these more papers/articles did not include due to no. of pages/space issues). The systematic analysis of the research literature has focus on the key role of Data Intrinsic Problems in classification, handling the imbalanced data and the techniques used to overcome the skewed distribution. Furthermore, this article reveals patterns, trends and gaps in the existing literature and discusses briefly the next generation research directions in this area.

  • 869 Views
  • 623 Downloads
  • 3 Citations

Download

Keywords

  • Class Imbalance Problem
  • Data Mining
  • Machine Learning
  • Data-Level Methods
  • Algorithmic-Level Methods
  • Hybrid Methods