Improved Adaptive Semi-Unsupervised Weighted Oversampling using Sparsity Factor for Imbalanced Datasets

Haseeb Ali, Mohd Najib Mohd Salleh, Kashif Hussain

Research output: Contribution to journalArticlepeer-review

Abstract

With the incredible surge in data volumes, problems associated with data analysis have been increasingly complicated. In data mining algorithms, imbalanced data is a profound problem in machine learning paradigm. It appears due to desperate nature of data in which, one class with a large number of instances presents the majority class, while the other class with only a few instances is known as minority class. The classifier model biases towards the majority class and neglects the minority class which may happen to be the most essential class; resulting into costly misclassification error of minority class in real-world scenarios. Imbalanced data problem is significantly overcome by using re-sampling techniques, in which oversampling techniques are proven to be more effective than undersampling. This study proposes an Improved Adaptive Semi Unsupervised Weighted Oversampling (IA-SUWO) technique with sparsity factor, which efficiently solves between-the-class and within-the-class imbalances problem. Along with avoiding over-generalization, overfitting problems and removing noise from the data, this technique enhances the number of synthetic instances in the minority sub-clusters appropriately. A comprehensive experimental setup is used to evaluate the performance of the proposed approach. The comparative analysis reveals that the IA-SUWO performs better than the existing baseline oversampling techniques.
Original languageEnglish
JournalInternational Journal of Advanced Computer Science and Applications
Volume10
Issue number11
DOIs
Publication statusPublished - 2019

Cite this