Exploring feature selection and support vector machine in text categorization

With the growing number of text documents in the Internet, it is difficult for users to search, find, manage and organize information quickly. Normally, text documents are classified manually and it is time-consuming. Text categorization is a process of assigning text documents into a set of fixed p...

Full description

Bibliographic Details
Published in:	Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013
Main Author:	Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M.
Format:	Conference paper
Language:	English
Published:	2013
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-84900359670&doi=10.1109%2fCSE.2013.160&partnerID=40&md5=90bd593e0696660a12d71ad4c336ef96

id	2-s2.0-84900359670
spelling	2-s2.0-84900359670 Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M. Exploring feature selection and support vector machine in text categorization 2013 Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013 10.1109/CSE.2013.160 https://www.scopus.com/inward/record.uri?eid=2-s2.0-84900359670&doi=10.1109%2fCSE.2013.160&partnerID=40&md5=90bd593e0696660a12d71ad4c336ef96 With the growing number of text documents in the Internet, it is difficult for users to search, find, manage and organize information quickly. Normally, text documents are classified manually and it is time-consuming. Text categorization is a process of assigning text documents into a set of fixed predefined categories. The high dimensionality of text documents made it difficult to categorize because text documents contain noise and useless data. This paper explored several methods of feature selection that can be used to reduce high dimensionality of feature space in text documents such as Information Gain, Gain Ratio, CHI-Squares, Mutual Information and Document frequency. Next, the study adopted text categorization using Support Vector Machines. The results showed that Support Vector Machines perform well and very fast both in training and testing datasets. © 2013 IEEE. English Conference paper
author	Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M.
spellingShingle	Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M. Exploring feature selection and support vector machine in text categorization
author_facet	Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M.
author_sort	Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M.
title	Exploring feature selection and support vector machine in text categorization
title_short	Exploring feature selection and support vector machine in text categorization
title_full	Exploring feature selection and support vector machine in text categorization
title_fullStr	Exploring feature selection and support vector machine in text categorization
title_full_unstemmed	Exploring feature selection and support vector machine in text categorization
title_sort	Exploring feature selection and support vector machine in text categorization
publishDate	2013
container_title	Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013
container_volume
container_issue
doi_str_mv	10.1109/CSE.2013.160
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-84900359670&doi=10.1109%2fCSE.2013.160&partnerID=40&md5=90bd593e0696660a12d71ad4c336ef96
description	With the growing number of text documents in the Internet, it is difficult for users to search, find, manage and organize information quickly. Normally, text documents are classified manually and it is time-consuming. Text categorization is a process of assigning text documents into a set of fixed predefined categories. The high dimensionality of text documents made it difficult to categorize because text documents contain noise and useless data. This paper explored several methods of feature selection that can be used to reduce high dimensionality of feature space in text documents such as Information Gain, Gain Ratio, CHI-Squares, Mutual Information and Document frequency. Next, the study adopted text categorization using Support Vector Machines. The results showed that Support Vector Machines perform well and very fast both in training and testing datasets. © 2013 IEEE.
publisher
issn
language	English
format	Conference paper
accesstype
record_format	scopus
collection	Scopus
_version_	1809677610550034432

Exploring feature selection and support vector machine in text categorization

Similar Items