A Wide Scale Classification of Class Imbalance Problem and its Solutions: A Systematic Literature Review
- 1 Koneru Lakshmaiah Education Foundation, India
- 2 Vellore Institute of Technology, India
Copyright: © 2020 Gillala Rekha, Amit Kumar Tyagi and V. Krishna Reddy. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
In today’s world, most of the data (real world) is present in imbalanced form by nature. This is because of not having efficient algorithms to put this data (i.e., generated data by billion of internet- connected devices (IoTs)) in respective format. Imbalanced data poses a great challenge to (both) data mining and machine learning algorithms. The imbalanced dataset consists of a majority class and a minority class, where the majority class takes the lead over the minority class. Generally, several standard learning algorithms assume the balanced class distribution or equal misclassification costs. If prediction is performed by these learning algorithms on imbalanced data, the accuracy will be high for majority classes, i.e., resulting in poor performance. To overcome this problem (or improving accuracy of deision/prediction-making process), data mining and machine learning researchers have addressed the problem of imbalanced data using data-level, algorithmic level and ensemble or hybrid methods. This article presents a systematic literature review and analyze the results of more than 400 research papers published between 2002-2017 (till June 2017), resulting in a broader and elaborate investigation of the literature in this area of research. Note that extension of this article/work will contain till December 2018 research articles, which will be published in June 2019 (now these more papers/articles did not include due to no. of pages/space issues). The systematic analysis of the research literature has focus on the key role of Data Intrinsic Problems in classification, handling the imbalanced data and the techniques used to overcome the skewed distribution. Furthermore, this article reveals patterns, trends and gaps in the existing literature and discusses briefly the next generation research directions in this area.
- 869 Views
- 623 Downloads
- 3 Citations
- Class Imbalance Problem
- Data Mining
- Machine Learning
- Data-Level Methods
- Algorithmic-Level Methods
- Hybrid Methods