Exploring feature selection and support vector machine in text categorization

With the growing number of text documents in the Internet, it is difficult for users to search, find, manage and organize information quickly. Normally, text documents are classified manually and it is time-consuming. Text categorization is a process of assigning text documents into a set of fixed p...

Full description

Bibliographic Details
Published in:Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013
Main Author: Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M.
Format: Conference paper
Language:English
Published: 2013
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-84900359670&doi=10.1109%2fCSE.2013.160&partnerID=40&md5=90bd593e0696660a12d71ad4c336ef96
id 2-s2.0-84900359670
spelling 2-s2.0-84900359670
Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M.
Exploring feature selection and support vector machine in text categorization
2013
Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013


10.1109/CSE.2013.160
https://www.scopus.com/inward/record.uri?eid=2-s2.0-84900359670&doi=10.1109%2fCSE.2013.160&partnerID=40&md5=90bd593e0696660a12d71ad4c336ef96
With the growing number of text documents in the Internet, it is difficult for users to search, find, manage and organize information quickly. Normally, text documents are classified manually and it is time-consuming. Text categorization is a process of assigning text documents into a set of fixed predefined categories. The high dimensionality of text documents made it difficult to categorize because text documents contain noise and useless data. This paper explored several methods of feature selection that can be used to reduce high dimensionality of feature space in text documents such as Information Gain, Gain Ratio, CHI-Squares, Mutual Information and Document frequency. Next, the study adopted text categorization using Support Vector Machines. The results showed that Support Vector Machines perform well and very fast both in training and testing datasets. © 2013 IEEE.


English
Conference paper

author Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M.
spellingShingle Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M.
Exploring feature selection and support vector machine in text categorization
author_facet Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M.
author_sort Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M.
title Exploring feature selection and support vector machine in text categorization
title_short Exploring feature selection and support vector machine in text categorization
title_full Exploring feature selection and support vector machine in text categorization
title_fullStr Exploring feature selection and support vector machine in text categorization
title_full_unstemmed Exploring feature selection and support vector machine in text categorization
title_sort Exploring feature selection and support vector machine in text categorization
publishDate 2013
container_title Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013
container_volume
container_issue
doi_str_mv 10.1109/CSE.2013.160
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-84900359670&doi=10.1109%2fCSE.2013.160&partnerID=40&md5=90bd593e0696660a12d71ad4c336ef96
description With the growing number of text documents in the Internet, it is difficult for users to search, find, manage and organize information quickly. Normally, text documents are classified manually and it is time-consuming. Text categorization is a process of assigning text documents into a set of fixed predefined categories. The high dimensionality of text documents made it difficult to categorize because text documents contain noise and useless data. This paper explored several methods of feature selection that can be used to reduce high dimensionality of feature space in text documents such as Information Gain, Gain Ratio, CHI-Squares, Mutual Information and Document frequency. Next, the study adopted text categorization using Support Vector Machines. The results showed that Support Vector Machines perform well and very fast both in training and testing datasets. © 2013 IEEE.
publisher
issn
language English
format Conference paper
accesstype
record_format scopus
collection Scopus
_version_ 1809677610550034432