Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification

Class imbalance is one of the most significant difficulties in modern machine learning. This is because of the inherent bias of standard classifiers toward favoring majority instances while often ignoring minority instances. Interpolation-based oversampling techniques are among the most popular solu...

Full description

Bibliographic Details
Published in:Journal of King Saud University - Computer and Information Sciences
Main Author: Wang Y.; Rosli M.M.; Musa N.; Wang L.
Format: Article
Language:English
Published: King Saud bin Abdulaziz University 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85212189270&doi=10.1016%2fj.jksuci.2024.102253&partnerID=40&md5=1ce58b30e34c016e1273671a86217847
id 2-s2.0-85212189270
spelling 2-s2.0-85212189270
Wang Y.; Rosli M.M.; Musa N.; Wang L.
Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
2024
Journal of King Saud University - Computer and Information Sciences


10.1016/j.jksuci.2024.102253
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85212189270&doi=10.1016%2fj.jksuci.2024.102253&partnerID=40&md5=1ce58b30e34c016e1273671a86217847
Class imbalance is one of the most significant difficulties in modern machine learning. This is because of the inherent bias of standard classifiers toward favoring majority instances while often ignoring minority instances. Interpolation-based oversampling techniques are among the most popular solutions for generating synthetic minority samples to correct imbalanced class distributions. However, synthetic minority samples have a risk of overlapping with the majority-class samples. Inappropriate interpolation of minority samples during oversampling can also result in over generalization. To overcome these drawbacks, we propose a Clustering-based and Adaptive Position-aware Interpolation Oversampling algorithm (CAPAIO) for imbalanced binary dataset classification. CAPAIO initially employs an improved density-based clustering algorithm to group minority instances into inland, borderline, and trapped samples. It then adaptively determines the size of each subcluster and allocates weights to minority samples, guiding the synthesis of minority samples based on these weights. Finally, distinct interpolation oversampling algorithms are individually performed on these three categories of minority samples. The experimental results demonstrate the effectiveness of the proposed CAPAIO in most datasets compared with eleven other oversampling algorithms. © 2024 The Author(s)
King Saud bin Abdulaziz University
13191578
English
Article
All Open Access; Gold Open Access
author Wang Y.; Rosli M.M.; Musa N.; Wang L.
spellingShingle Wang Y.; Rosli M.M.; Musa N.; Wang L.
Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
author_facet Wang Y.; Rosli M.M.; Musa N.; Wang L.
author_sort Wang Y.; Rosli M.M.; Musa N.; Wang L.
title Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
title_short Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
title_full Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
title_fullStr Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
title_full_unstemmed Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
title_sort Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
publishDate 2024
container_title Journal of King Saud University - Computer and Information Sciences
container_volume
container_issue
doi_str_mv 10.1016/j.jksuci.2024.102253
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85212189270&doi=10.1016%2fj.jksuci.2024.102253&partnerID=40&md5=1ce58b30e34c016e1273671a86217847
description Class imbalance is one of the most significant difficulties in modern machine learning. This is because of the inherent bias of standard classifiers toward favoring majority instances while often ignoring minority instances. Interpolation-based oversampling techniques are among the most popular solutions for generating synthetic minority samples to correct imbalanced class distributions. However, synthetic minority samples have a risk of overlapping with the majority-class samples. Inappropriate interpolation of minority samples during oversampling can also result in over generalization. To overcome these drawbacks, we propose a Clustering-based and Adaptive Position-aware Interpolation Oversampling algorithm (CAPAIO) for imbalanced binary dataset classification. CAPAIO initially employs an improved density-based clustering algorithm to group minority instances into inland, borderline, and trapped samples. It then adaptively determines the size of each subcluster and allocates weights to minority samples, guiding the synthesis of minority samples based on these weights. Finally, distinct interpolation oversampling algorithms are individually performed on these three categories of minority samples. The experimental results demonstrate the effectiveness of the proposed CAPAIO in most datasets compared with eleven other oversampling algorithms. © 2024 The Author(s)
publisher King Saud bin Abdulaziz University
issn 13191578
language English
format Article
accesstype All Open Access; Gold Open Access
record_format scopus
collection Scopus
_version_ 1820775437302759424