Exploring feature selection and support vector machine in text categorization
With the growing number of text documents in the Internet, it is difficult for users to search, find, manage and organize information quickly. Normally, text documents are classified manually and it is time-consuming. Text categorization is a process of assigning text documents into a set of fixed p...
Published in: | Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013 |
---|---|
Main Author: | |
Format: | Conference paper |
Language: | English |
Published: |
2013
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84900359670&doi=10.1109%2fCSE.2013.160&partnerID=40&md5=90bd593e0696660a12d71ad4c336ef96 |
id |
2-s2.0-84900359670 |
---|---|
spelling |
2-s2.0-84900359670 Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M. Exploring feature selection and support vector machine in text categorization 2013 Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013 10.1109/CSE.2013.160 https://www.scopus.com/inward/record.uri?eid=2-s2.0-84900359670&doi=10.1109%2fCSE.2013.160&partnerID=40&md5=90bd593e0696660a12d71ad4c336ef96 With the growing number of text documents in the Internet, it is difficult for users to search, find, manage and organize information quickly. Normally, text documents are classified manually and it is time-consuming. Text categorization is a process of assigning text documents into a set of fixed predefined categories. The high dimensionality of text documents made it difficult to categorize because text documents contain noise and useless data. This paper explored several methods of feature selection that can be used to reduce high dimensionality of feature space in text documents such as Information Gain, Gain Ratio, CHI-Squares, Mutual Information and Document frequency. Next, the study adopted text categorization using Support Vector Machines. The results showed that Support Vector Machines perform well and very fast both in training and testing datasets. © 2013 IEEE. English Conference paper |
author |
Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M. |
spellingShingle |
Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M. Exploring feature selection and support vector machine in text categorization |
author_facet |
Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M. |
author_sort |
Abdul-Rahman S.; Mutalib S.; Khanafi N.A.; Ali A.M. |
title |
Exploring feature selection and support vector machine in text categorization |
title_short |
Exploring feature selection and support vector machine in text categorization |
title_full |
Exploring feature selection and support vector machine in text categorization |
title_fullStr |
Exploring feature selection and support vector machine in text categorization |
title_full_unstemmed |
Exploring feature selection and support vector machine in text categorization |
title_sort |
Exploring feature selection and support vector machine in text categorization |
publishDate |
2013 |
container_title |
Proceedings - 16th IEEE International Conference on Computational Science and Engineering, CSE 2013 |
container_volume |
|
container_issue |
|
doi_str_mv |
10.1109/CSE.2013.160 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-84900359670&doi=10.1109%2fCSE.2013.160&partnerID=40&md5=90bd593e0696660a12d71ad4c336ef96 |
description |
With the growing number of text documents in the Internet, it is difficult for users to search, find, manage and organize information quickly. Normally, text documents are classified manually and it is time-consuming. Text categorization is a process of assigning text documents into a set of fixed predefined categories. The high dimensionality of text documents made it difficult to categorize because text documents contain noise and useless data. This paper explored several methods of feature selection that can be used to reduce high dimensionality of feature space in text documents such as Information Gain, Gain Ratio, CHI-Squares, Mutual Information and Document frequency. Next, the study adopted text categorization using Support Vector Machines. The results showed that Support Vector Machines perform well and very fast both in training and testing datasets. © 2013 IEEE. |
publisher |
|
issn |
|
language |
English |
format |
Conference paper |
accesstype |
|
record_format |
scopus |
collection |
Scopus |
_version_ |
1809677610550034432 |