Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset

The prevalence of obesity among Malaysians is estimated by calculating the obesity prevalence percentage using BMI prevalence data from the national health morbidity survey (NHMS). However, the nutrition data from the NHMS has not been used to predict the national obesity prevalence as it was collec...

Full description

Bibliographic Details
Published in:	Journal of Advanced Research in Applied Sciences and Engineering Technology
Main Author:	Daud N.; Noordin N.; Lokman A.
Format:	Article
Language:	English
Published:	Semarak Ilmu Publishing 2025
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85201563120&doi=10.37934%2faraset.49.1.129138&partnerID=40&md5=ea5bb59727260e9dad351a6b37e9fb6a

id	2-s2.0-85201563120
spelling	2-s2.0-85201563120 Daud N.; Noordin N.; Lokman A. Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset 2025 Journal of Advanced Research in Applied Sciences and Engineering Technology 49 1 10.37934/araset.49.1.129138 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85201563120&doi=10.37934%2faraset.49.1.129138&partnerID=40&md5=ea5bb59727260e9dad351a6b37e9fb6a The prevalence of obesity among Malaysians is estimated by calculating the obesity prevalence percentage using BMI prevalence data from the national health morbidity survey (NHMS). However, the nutrition data from the NHMS has not been used to predict the national obesity prevalence as it was collected solely for the documentation of an analysis report on the food consumption patterns of the base population. To address this gap, this study utilises nutrition data by employing 15 nutrition variables derived from grocery data to predict obesity. This paper seeks to identify the appropriate nutrition variable, which involved exploring 8238 rows of raw grocery data (grocery receipt) collected from 35 households. During the data pre-processing phase, 15 nutrition variables were generated in the data conversion and data transformation phase of the data pre-processing phase of this study. This study predicts the percentage of selected nutrition variables that could lead to obesity in individuals. The purpose of this study is to find alternative data (grocery data) that can be used to predict obesity and to test the relevance of using that alternative data in predicting obesity by evaluating the accuracy performance measurement of the prediction through the use of data mining technology. This study predicts the percentage of macronutrients variables that could lead to obesity in individuals. To simplify the prediction model, the dataset variables were filtered using the automated feature selection method in the WEKA machine learning tool version 3.8. The objective of the feature selection performance of variables from the dataset was to identify the nutrition variables that have the most significant impact on developing accurate prediction models by evaluating the accuracy performance of the model using area under curve score (AUC). The generated nutrition dataset was subjected to the subset method known as correlation-based-feature-selection (CFS) and wrapper methods that included a learning algorithm in the attribute selection process. Several subsets were extracted during the feature selection phase, which served as potential input datasets (predictor) for developing obesity prediction models using different classification algorithms. Based on the feature selection evaluation conducted in this study, the CFS method was found to be the best feature selection method compared to the three wrapper methods conducted, which resulted in the selection of calorie_intake and foodpyramid_level3% variables as the appropriate predictors for this study. These results can enhance the reliability of using household grocery data to predict obesity and open new avenues for research into nutrition and health prediction. © 2025, Semarak Ilmu Publishing. All rights reserved. Semarak Ilmu Publishing 24621943 English Article All Open Access; Hybrid Gold Open Access
author	Daud N.; Noordin N.; Lokman A.
spellingShingle	Daud N.; Noordin N.; Lokman A. Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
author_facet	Daud N.; Noordin N.; Lokman A.
author_sort	Daud N.; Noordin N.; Lokman A.
title	Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
title_short	Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
title_full	Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
title_fullStr	Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
title_full_unstemmed	Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
title_sort	Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
publishDate	2025
container_title	Journal of Advanced Research in Applied Sciences and Engineering Technology
container_volume	49
container_issue	1
doi_str_mv	10.37934/araset.49.1.129138
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85201563120&doi=10.37934%2faraset.49.1.129138&partnerID=40&md5=ea5bb59727260e9dad351a6b37e9fb6a
description	The prevalence of obesity among Malaysians is estimated by calculating the obesity prevalence percentage using BMI prevalence data from the national health morbidity survey (NHMS). However, the nutrition data from the NHMS has not been used to predict the national obesity prevalence as it was collected solely for the documentation of an analysis report on the food consumption patterns of the base population. To address this gap, this study utilises nutrition data by employing 15 nutrition variables derived from grocery data to predict obesity. This paper seeks to identify the appropriate nutrition variable, which involved exploring 8238 rows of raw grocery data (grocery receipt) collected from 35 households. During the data pre-processing phase, 15 nutrition variables were generated in the data conversion and data transformation phase of the data pre-processing phase of this study. This study predicts the percentage of selected nutrition variables that could lead to obesity in individuals. The purpose of this study is to find alternative data (grocery data) that can be used to predict obesity and to test the relevance of using that alternative data in predicting obesity by evaluating the accuracy performance measurement of the prediction through the use of data mining technology. This study predicts the percentage of macronutrients variables that could lead to obesity in individuals. To simplify the prediction model, the dataset variables were filtered using the automated feature selection method in the WEKA machine learning tool version 3.8. The objective of the feature selection performance of variables from the dataset was to identify the nutrition variables that have the most significant impact on developing accurate prediction models by evaluating the accuracy performance of the model using area under curve score (AUC). The generated nutrition dataset was subjected to the subset method known as correlation-based-feature-selection (CFS) and wrapper methods that included a learning algorithm in the attribute selection process. Several subsets were extracted during the feature selection phase, which served as potential input datasets (predictor) for developing obesity prediction models using different classification algorithms. Based on the feature selection evaluation conducted in this study, the CFS method was found to be the best feature selection method compared to the three wrapper methods conducted, which resulted in the selection of calorie_intake and foodpyramid_level3% variables as the appropriate predictors for this study. These results can enhance the reliability of using household grocery data to predict obesity and open new avenues for research into nutrition and health prediction. © 2025, Semarak Ilmu Publishing. All rights reserved.
publisher	Semarak Ilmu Publishing
issn	24621943
language	English
format	Article
accesstype	All Open Access; Hybrid Gold Open Access
record_format	scopus
collection	Scopus
_version_	1814778497014431744

Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset

Similar Items