Chinese paper classification based on pre-trained language model and hybrid deep learning method

With the explosive growth in the number of published papers, researchers must filter papers by category to improve retrieval efficiency. The features of data can be learned through complex network structures of deep learning models without the need for manual definition and extraction in advance, re...

Full description

Bibliographic Details
Published in:	IAES International Journal of Artificial Intelligence
Main Author:	Luo X.; Mutalib S.; Syed Aris S.R.
Format:	Article
Language:	English
Published:	Institute of Advanced Engineering and Science 2025
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85211108747&doi=10.11591%2fijai.v14.i1.pp641-649&partnerID=40&md5=71fa316d517a3df79a6437377a912a95

id	2-s2.0-85211108747
spelling	2-s2.0-85211108747 Luo X.; Mutalib S.; Syed Aris S.R. Chinese paper classification based on pre-trained language model and hybrid deep learning method 2025 IAES International Journal of Artificial Intelligence 14 1 10.11591/ijai.v14.i1.pp641-649 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85211108747&doi=10.11591%2fijai.v14.i1.pp641-649&partnerID=40&md5=71fa316d517a3df79a6437377a912a95 With the explosive growth in the number of published papers, researchers must filter papers by category to improve retrieval efficiency. The features of data can be learned through complex network structures of deep learning models without the need for manual definition and extraction in advance, resulting in better processing performance for large datasets. In our study, the pre-trained language model bidirectional encoder representations from transformers (BERT) and other deep learning models were applied to paper classification. A large-scale chinese scientific literature dataset was used, including abstracts, keywords, titles, disciplines, and categories from 396 k papers. Currently, there is little in-depth research on the role of titles, abstracts, and keywords in classification and how they are used in combination. To address this issue, we evaluated classification results by employing different title, abstract, and keywords concatenation methods to generate model input data, and compared the effects of a single sentence or sentence pair data input methods. We also adopted an ensemble learning approach to integrate the results of models that processed titles, keywords, and abstracts independently to find the best combination. Finally, we studied the combination of different types of models, such as the combination of BERT and convolutional neural networks (CNN), and measured the performance by accuracy, weighted average precision, weighted average recall, and weighted average F1 score. © 2025, Institute of Advanced Engineering and Science. All rights reserved. Institute of Advanced Engineering and Science 20894872 English Article
author	Luo X.; Mutalib S.; Syed Aris S.R.
spellingShingle	Luo X.; Mutalib S.; Syed Aris S.R. Chinese paper classification based on pre-trained language model and hybrid deep learning method
author_facet	Luo X.; Mutalib S.; Syed Aris S.R.
author_sort	Luo X.; Mutalib S.; Syed Aris S.R.
title	Chinese paper classification based on pre-trained language model and hybrid deep learning method
title_short	Chinese paper classification based on pre-trained language model and hybrid deep learning method
title_full	Chinese paper classification based on pre-trained language model and hybrid deep learning method
title_fullStr	Chinese paper classification based on pre-trained language model and hybrid deep learning method
title_full_unstemmed	Chinese paper classification based on pre-trained language model and hybrid deep learning method
title_sort	Chinese paper classification based on pre-trained language model and hybrid deep learning method
publishDate	2025
container_title	IAES International Journal of Artificial Intelligence
container_volume	14
container_issue	1
doi_str_mv	10.11591/ijai.v14.i1.pp641-649
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85211108747&doi=10.11591%2fijai.v14.i1.pp641-649&partnerID=40&md5=71fa316d517a3df79a6437377a912a95
description	With the explosive growth in the number of published papers, researchers must filter papers by category to improve retrieval efficiency. The features of data can be learned through complex network structures of deep learning models without the need for manual definition and extraction in advance, resulting in better processing performance for large datasets. In our study, the pre-trained language model bidirectional encoder representations from transformers (BERT) and other deep learning models were applied to paper classification. A large-scale chinese scientific literature dataset was used, including abstracts, keywords, titles, disciplines, and categories from 396 k papers. Currently, there is little in-depth research on the role of titles, abstracts, and keywords in classification and how they are used in combination. To address this issue, we evaluated classification results by employing different title, abstract, and keywords concatenation methods to generate model input data, and compared the effects of a single sentence or sentence pair data input methods. We also adopted an ensemble learning approach to integrate the results of models that processed titles, keywords, and abstracts independently to find the best combination. Finally, we studied the combination of different types of models, such as the combination of BERT and convolutional neural networks (CNN), and measured the performance by accuracy, weighted average precision, weighted average recall, and weighted average F1 score. © 2025, Institute of Advanced Engineering and Science. All rights reserved.
publisher	Institute of Advanced Engineering and Science
issn	20894872
language	English
format	Article
accesstype
record_format	scopus
collection	Scopus
_version_	1820775427665297408

Chinese paper classification based on pre-trained language model and hybrid deep learning method

Similar Items