Exploring feature selection and support vector machine in text categorization

With the growing number of text documents in the Internet, it is difficult for users to search, find, manage and organize information quickly. Normally, text documents are classified manually and it is time-consuming. Text categorization is a process of assigning text documents into a set of fixed p...

Full description

Bibliographic Details
Published in:Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013
Main Author: Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M.
Format: Conference paper
Language:English
Published: 2013
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-84900359670&doi=10.1109%2fCSE.2013.160&partnerID=40&md5=90bd593e0696660a12d71ad4c336ef96
Description
Summary:With the growing number of text documents in the Internet, it is difficult for users to search, find, manage and organize information quickly. Normally, text documents are classified manually and it is time-consuming. Text categorization is a process of assigning text documents into a set of fixed predefined categories. The high dimensionality of text documents made it difficult to categorize because text documents contain noise and useless data. This paper explored several methods of feature selection that can be used to reduce high dimensionality of feature space in text documents such as Information Gain, Gain Ratio, CHI-Squares, Mutual Information and Document frequency. Next, the study adopted text categorization using Support Vector Machines. The results showed that Support Vector Machines perform well and very fast both in training and testing datasets. © 2013 IEEE.
ISSN:
DOI:10.1109/CSE.2013.160