Handling imbalanced dataset using SVM and k-NN approach

Data mining classification methods are affected when the data is imbalanced, that is, when one class is larger than the other class in size for the case of a two-class dependent variable. Many new methods have been developed to handle imbalanced datasets. In handling a binary classification task, Su...

وصف كامل

التفاصيل البيبلوغرافية
الحاوية / القاعدة:	AIP Conference Proceedings
المؤلف الرئيسي:	Wah Y.B.; Rahman H.A.A.; He H.; Bulgiba A.
التنسيق:	Conference paper
اللغة:	English
منشور في:	American Institute of Physics Inc. 2016
الوصول للمادة أونلاين:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-84984550446&doi=10.1063%2f1.4954536&partnerID=40&md5=1831061d4fefe8f88c4cc686c646a113

id	2-s2.0-84984550446
spelling	2-s2.0-84984550446 Wah Y.B.; Rahman H.A.A.; He H.; Bulgiba A. Handling imbalanced dataset using SVM and k-NN approach 2016 AIP Conference Proceedings 1750 10.1063/1.4954536 https://www.scopus.com/inward/record.uri?eid=2-s2.0-84984550446&doi=10.1063%2f1.4954536&partnerID=40&md5=1831061d4fefe8f88c4cc686c646a113 Data mining classification methods are affected when the data is imbalanced, that is, when one class is larger than the other class in size for the case of a two-class dependent variable. Many new methods have been developed to handle imbalanced datasets. In handling a binary classification task, Support Vector Machine (SVM) is one of the methods reported to give a high accuracy in predictive modeling compared to the other techniques such as Logistic Regression and Discriminant Analysis. The strength of SVM is the robustness of its algorithm and the capability to integrate with kernel-based learning that results in a more flexible analysis and optimized solution. Another popular method to handle imbalanced data is the random sampling method, such as random undersampling, random oversampling and synthetic sampling. The application of the Nearest Neighbours techniques in sampling approach has been seen as having a bigger advantage compared to other methods, as it can handle both structured and non-structured data. There are some studies that implement an ensemble method of both SVM and Nearest Neighbours with good results. This paper discusses the various methods in handling imbalanced data and an illustration of using SVM and k-Nearest Neighbours (k-NN) on a real-data set. © 2016 Author(s). American Institute of Physics Inc. 0094243X English Conference paper
author	Wah Y.B.; Rahman H.A.A.; He H.; Bulgiba A.
spellingShingle	Wah Y.B.; Rahman H.A.A.; He H.; Bulgiba A. Handling imbalanced dataset using SVM and k-NN approach
author_facet	Wah Y.B.; Rahman H.A.A.; He H.; Bulgiba A.
author_sort	Wah Y.B.; Rahman H.A.A.; He H.; Bulgiba A.
title	Handling imbalanced dataset using SVM and k-NN approach
title_short	Handling imbalanced dataset using SVM and k-NN approach
title_full	Handling imbalanced dataset using SVM and k-NN approach
title_fullStr	Handling imbalanced dataset using SVM and k-NN approach
title_full_unstemmed	Handling imbalanced dataset using SVM and k-NN approach
title_sort	Handling imbalanced dataset using SVM and k-NN approach
publishDate	2016
container_title	AIP Conference Proceedings
container_volume	1750
container_issue
doi_str_mv	10.1063/1.4954536
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-84984550446&doi=10.1063%2f1.4954536&partnerID=40&md5=1831061d4fefe8f88c4cc686c646a113
description	Data mining classification methods are affected when the data is imbalanced, that is, when one class is larger than the other class in size for the case of a two-class dependent variable. Many new methods have been developed to handle imbalanced datasets. In handling a binary classification task, Support Vector Machine (SVM) is one of the methods reported to give a high accuracy in predictive modeling compared to the other techniques such as Logistic Regression and Discriminant Analysis. The strength of SVM is the robustness of its algorithm and the capability to integrate with kernel-based learning that results in a more flexible analysis and optimized solution. Another popular method to handle imbalanced data is the random sampling method, such as random undersampling, random oversampling and synthetic sampling. The application of the Nearest Neighbours techniques in sampling approach has been seen as having a bigger advantage compared to other methods, as it can handle both structured and non-structured data. There are some studies that implement an ensemble method of both SVM and Nearest Neighbours with good results. This paper discusses the various methods in handling imbalanced data and an illustration of using SVM and k-Nearest Neighbours (k-NN) on a real-data set. © 2016 Author(s).
publisher	American Institute of Physics Inc.
issn	0094243X
language	English
format	Conference paper
accesstype
record_format	scopus
collection	Scopus
_version_	1825722585808109568

Handling imbalanced dataset using SVM and k-NN approach

مواد مشابهة