Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
Class imbalance is one of the most significant difficulties in modern machine learning. This is because of the inherent bias of standard classifiers toward favoring majority instances while often ignoring minority instances. Interpolation-based oversampling techniques are among the most popular solu...
Published in: | JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Published: |
SPRINGERNATURE
2024
|
Subjects: | |
Online Access: | https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001410486900001 |
author |
Wang Yujiang; Rosli Marshima Mohd; Musa Norzilah; Wang Lei |
---|---|
spellingShingle |
Wang Yujiang; Rosli Marshima Mohd; Musa Norzilah; Wang Lei Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification Computer Science |
author_facet |
Wang Yujiang; Rosli Marshima Mohd; Musa Norzilah; Wang Lei |
author_sort |
Wang |
spelling |
Wang, Yujiang; Rosli, Marshima Mohd; Musa, Norzilah; Wang, Lei Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES English Article Class imbalance is one of the most significant difficulties in modern machine learning. This is because of the inherent bias of standard classifiers toward favoring majority instances while often ignoring minority instances. Interpolation-based oversampling techniques are among the most popular solutions for generating synthetic minority samples to correct imbalanced class distributions. However, synthetic minority samples have a risk of overlapping with the majority-class samples. Inappropriate interpolation of minority samples during oversampling can also result in over generalization. To overcome these drawbacks, we propose a Clustering- based and Adaptive Position-aware Interpolation Oversampling algorithm (CAPAIO) for imbalanced binary dataset classification. CAPAIO initially employs an improved density-based clustering algorithm to group minority instances into inland, borderline, and trapped samples. It then adaptively determines the size of each subcluster and allocates weights to minority samples, guiding the synthesis of minority samples based on these weights. Finally, distinct interpolation oversampling algorithms are individually performed on these three categories of minority samples. The experimental results demonstrate the effectiveness of the proposed CAPAIO inmost datasets compared with eleven other oversampling algorithms. SPRINGERNATURE 1319-1578 2213-1248 2024 36 10 10.1016/j.jksuci.2024.102253 Computer Science gold WOS:001410486900001 https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001410486900001 |
title |
Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification |
title_short |
Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification |
title_full |
Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification |
title_fullStr |
Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification |
title_full_unstemmed |
Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification |
title_sort |
Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification |
container_title |
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES |
language |
English |
format |
Article |
description |
Class imbalance is one of the most significant difficulties in modern machine learning. This is because of the inherent bias of standard classifiers toward favoring majority instances while often ignoring minority instances. Interpolation-based oversampling techniques are among the most popular solutions for generating synthetic minority samples to correct imbalanced class distributions. However, synthetic minority samples have a risk of overlapping with the majority-class samples. Inappropriate interpolation of minority samples during oversampling can also result in over generalization. To overcome these drawbacks, we propose a Clustering- based and Adaptive Position-aware Interpolation Oversampling algorithm (CAPAIO) for imbalanced binary dataset classification. CAPAIO initially employs an improved density-based clustering algorithm to group minority instances into inland, borderline, and trapped samples. It then adaptively determines the size of each subcluster and allocates weights to minority samples, guiding the synthesis of minority samples based on these weights. Finally, distinct interpolation oversampling algorithms are individually performed on these three categories of minority samples. The experimental results demonstrate the effectiveness of the proposed CAPAIO inmost datasets compared with eleven other oversampling algorithms. |
publisher |
SPRINGERNATURE |
issn |
1319-1578 2213-1248 |
publishDate |
2024 |
container_volume |
36 |
container_issue |
10 |
doi_str_mv |
10.1016/j.jksuci.2024.102253 |
topic |
Computer Science |
topic_facet |
Computer Science |
accesstype |
gold |
id |
WOS:001410486900001 |
url |
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001410486900001 |
record_format |
wos |
collection |
Web of Science (WoS) |
_version_ |
1825722598675185664 |