Comparisons of ADABOOST, KNN, SVM and logistic regression in classification of imbalanced dataset

Data mining classification techniques are affected by the presence of imbalances between classes of a response variable. The difficulty in handling the imbalanced data issue has led to an influx of methods, either resolving the imbalance issue at data or algorithmic level. The R programming language...

Full description

Bibliographic Details
Published in:	Communications in Computer and Information Science
Main Author:	Rahman H.A.A.; Wah Y.B.; He H.; Bulgiba A.
Format:	Conference paper
Language:	English
Published:	Springer Verlag 2015
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-84946092788&doi=10.1007%2f978-981-287-936-3_6&partnerID=40&md5=c6e2d7a002f123ebbbc0d3de35a84e64

Description
Summary:	Data mining classification techniques are affected by the presence of imbalances between classes of a response variable. The difficulty in handling the imbalanced data issue has led to an influx of methods, either resolving the imbalance issue at data or algorithmic level. The R programming language is one of the many tools available for data mining. This paper compares some classification algorithms in R for an imbalanced medical data set. The classifiers ADABOOST, KNN, SVM-RBF and logistic regression were applied to the original, random oversampling and undersampling data sets. Results show that ADABOOST, KNN and SVM-RBF exhibits over-fitting when applied to the original dataset. No over-fitting occurs for the random oversampling dataset where by SVM-RBF has the highest accuracy (Training: 91.5%, Testing: 90.6%), sensitivity (Training:91.0%, Testing: 91.0%), specificity (Training: 92.0%,Testing: 90.2%) and precision (Training:91.9%, Testing 90.5%) for training and testing data set. For random undersampling, no over-fitting occurs only for ADABOOST and logistic regression. Logistic regression is the most stable classifier exhibiting consistent training an testing results. © Springer Science+Business Media Singapore 2015.
ISSN:	18650929
DOI:	10.1007/978-981-287-936-3_6

Comparisons of ADABOOST, KNN, SVM and logistic regression in classification of imbalanced dataset

Similar Items