Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning

Effective software defect prediction is a crucial aspect of software quality assurance, enabling the identification of defective modules before the testing phase. This study aims to propose a comprehensive five-stage framework for software defect prediction, addressing the current challenges in the...

Full description

Bibliographic Details
Published in:	PEERJ COMPUTER SCIENCE
Main Authors:	Ali, Misbah; Mazhar, Tehseen; Al-Rasheed, Amal; Shahzad, Tariq; Ghadi, Yazeed Yasin; Khan, Muhammad Amir
Format:	Article
Language:	English
Published:	PEERJ INC 2024
Subjects:	Computer Science
Online Access:	https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001174202200001

author	Ali Misbah; Mazhar Tehseen; Al-Rasheed Amal; Shahzad Tariq; Ghadi Yazeed Yasin; Khan Muhammad Amir
spellingShingle	Ali Misbah; Mazhar Tehseen; Al-Rasheed Amal; Shahzad Tariq; Ghadi Yazeed Yasin; Khan Muhammad Amir Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning Computer Science
author_facet	Ali Misbah; Mazhar Tehseen; Al-Rasheed Amal; Shahzad Tariq; Ghadi Yazeed Yasin; Khan Muhammad Amir
author_sort	Ali
spelling	Ali, Misbah; Mazhar, Tehseen; Al-Rasheed, Amal; Shahzad, Tariq; Ghadi, Yazeed Yasin; Khan, Muhammad Amir Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning PEERJ COMPUTER SCIENCE English Article Effective software defect prediction is a crucial aspect of software quality assurance, enabling the identification of defective modules before the testing phase. This study aims to propose a comprehensive five-stage framework for software defect prediction, addressing the current challenges in the field. The first stage involves selecting a cleaned version of NASA's defect datasets, including CM1, JM1, MC2, MW1, PC1, PC3, and PC4, ensuring the data's integrity. In the second stage, a feature selection technique based on the genetic algorithm is applied to identify the optimal subset of features. In the third stage, three heterogeneous binary classifiers, namely random forest, support vector machine, and naive Bayes, are implemented as base classifiers. Through iterative tuning, the classifiers are optimized to achieve the highest level of accuracy individually. In the fourth stage, an ensemble machine-learning technique known as voting is applied as a master classifier, leveraging the collective decision-making power of the base classifiers. The final stage evaluates the performance of the proposed framework using five widely recognized performance evaluation measures: precision, recall, accuracy, Fmeasure, and area under the curve. Experimental results demonstrate that the proposed framework outperforms state-of-the-art ensemble and base classifiers employed in software defect prediction and achieves a maximum accuracy of 95.1%, showing its effectiveness in accurately identifying software defects. The framework also evaluates its efficiency by calculating execution times. Notably, it exhibits enhanced efficiency, significantly reducing the execution times during the training and testing phases by an average of 51.52% and 52.31%, respectively. This reduction contributes to a more computationally economical solution for accurate software defect prediction. PEERJ INC 2376-5992 2024 10 10.7717/peerj-cs.1860 Computer Science gold WOS:001174202200001 https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001174202200001
title	Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning
title_short	Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning
title_full	Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning
title_fullStr	Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning
title_full_unstemmed	Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning
title_sort	Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning
container_title	PEERJ COMPUTER SCIENCE
language	English
format	Article
description	Effective software defect prediction is a crucial aspect of software quality assurance, enabling the identification of defective modules before the testing phase. This study aims to propose a comprehensive five-stage framework for software defect prediction, addressing the current challenges in the field. The first stage involves selecting a cleaned version of NASA's defect datasets, including CM1, JM1, MC2, MW1, PC1, PC3, and PC4, ensuring the data's integrity. In the second stage, a feature selection technique based on the genetic algorithm is applied to identify the optimal subset of features. In the third stage, three heterogeneous binary classifiers, namely random forest, support vector machine, and naive Bayes, are implemented as base classifiers. Through iterative tuning, the classifiers are optimized to achieve the highest level of accuracy individually. In the fourth stage, an ensemble machine-learning technique known as voting is applied as a master classifier, leveraging the collective decision-making power of the base classifiers. The final stage evaluates the performance of the proposed framework using five widely recognized performance evaluation measures: precision, recall, accuracy, Fmeasure, and area under the curve. Experimental results demonstrate that the proposed framework outperforms state-of-the-art ensemble and base classifiers employed in software defect prediction and achieves a maximum accuracy of 95.1%, showing its effectiveness in accurately identifying software defects. The framework also evaluates its efficiency by calculating execution times. Notably, it exhibits enhanced efficiency, significantly reducing the execution times during the training and testing phases by an average of 51.52% and 52.31%, respectively. This reduction contributes to a more computationally economical solution for accurate software defect prediction.
publisher	PEERJ INC
issn	2376-5992
publishDate	2024
container_volume	10
container_issue
doi_str_mv	10.7717/peerj-cs.1860
topic	Computer Science
topic_facet	Computer Science
accesstype	gold
id	WOS:001174202200001
url	https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001174202200001
record_format	wos
collection	Web of Science (WoS)
_version_	1809678796480053248

Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning

Similar Items