TY - JOUR
T1 - Improved Adaptive Semi-Unsupervised Weighted Oversampling using Sparsity Factor for Imbalanced Datasets
AU - Ali, Haseeb
AU - Salleh, Mohd Najib Mohd
AU - Hussain, Kashif
PY - 2019
Y1 - 2019
N2 - With the incredible surge in data volumes, problems associated with data analysis have been increasingly complicated. In data mining algorithms, imbalanced data is a profound problem in machine learning paradigm. It appears due to desperate nature of data in which, one class with a large number of instances presents the majority class, while the other class with only a few instances is known as minority class. The classifier model biases towards the majority class and neglects the minority class which may happen to be the most essential class; resulting into costly misclassification error of minority class in real-world scenarios. Imbalanced data problem is significantly overcome by using re-sampling techniques, in which oversampling techniques are proven to be more effective than undersampling. This study proposes an Improved Adaptive Semi Unsupervised Weighted Oversampling (IA-SUWO) technique with sparsity factor, which efficiently solves between-the-class and within-the-class imbalances problem. Along with avoiding over-generalization, overfitting problems and removing noise from the data, this technique enhances the number of synthetic instances in the minority sub-clusters appropriately. A comprehensive experimental setup is used to evaluate the performance of the proposed approach. The comparative analysis reveals that the IA-SUWO performs better than the existing baseline oversampling techniques.
AB - With the incredible surge in data volumes, problems associated with data analysis have been increasingly complicated. In data mining algorithms, imbalanced data is a profound problem in machine learning paradigm. It appears due to desperate nature of data in which, one class with a large number of instances presents the majority class, while the other class with only a few instances is known as minority class. The classifier model biases towards the majority class and neglects the minority class which may happen to be the most essential class; resulting into costly misclassification error of minority class in real-world scenarios. Imbalanced data problem is significantly overcome by using re-sampling techniques, in which oversampling techniques are proven to be more effective than undersampling. This study proposes an Improved Adaptive Semi Unsupervised Weighted Oversampling (IA-SUWO) technique with sparsity factor, which efficiently solves between-the-class and within-the-class imbalances problem. Along with avoiding over-generalization, overfitting problems and removing noise from the data, this technique enhances the number of synthetic instances in the minority sub-clusters appropriately. A comprehensive experimental setup is used to evaluate the performance of the proposed approach. The comparative analysis reveals that the IA-SUWO performs better than the existing baseline oversampling techniques.
U2 - 10.14569/IJACSA.2019.0101152
DO - 10.14569/IJACSA.2019.0101152
M3 - Article
SN - 2158-107X
VL - 10
JO - International Journal of Advanced Computer Science and Applications
JF - International Journal of Advanced Computer Science and Applications
IS - 11
ER -