Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset

The prevalence of obesity among Malaysians is estimated by calculating the obesity prevalence percentage using BMI prevalence data from the national health morbidity survey (NHMS). However, the nutrition data from the NHMS has not been used to predict the national obesity prevalence as it was collec...

Full description

Bibliographic Details
Published in:Journal of Advanced Research in Applied Sciences and Engineering Technology
Main Author: Daud N.; Noordin N.; Lokman A.
Format: Article
Language:English
Published: Semarak Ilmu Publishing 2025
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85201563120&doi=10.37934%2faraset.49.1.129138&partnerID=40&md5=ea5bb59727260e9dad351a6b37e9fb6a
id 2-s2.0-85201563120
spelling 2-s2.0-85201563120
Daud N.; Noordin N.; Lokman A.
Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
2025
Journal of Advanced Research in Applied Sciences and Engineering Technology
49
1
10.37934/araset.49.1.129138
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85201563120&doi=10.37934%2faraset.49.1.129138&partnerID=40&md5=ea5bb59727260e9dad351a6b37e9fb6a
The prevalence of obesity among Malaysians is estimated by calculating the obesity prevalence percentage using BMI prevalence data from the national health morbidity survey (NHMS). However, the nutrition data from the NHMS has not been used to predict the national obesity prevalence as it was collected solely for the documentation of an analysis report on the food consumption patterns of the base population. To address this gap, this study utilises nutrition data by employing 15 nutrition variables derived from grocery data to predict obesity. This paper seeks to identify the appropriate nutrition variable, which involved exploring 8238 rows of raw grocery data (grocery receipt) collected from 35 households. During the data pre-processing phase, 15 nutrition variables were generated in the data conversion and data transformation phase of the data pre-processing phase of this study. This study predicts the percentage of selected nutrition variables that could lead to obesity in individuals. The purpose of this study is to find alternative data (grocery data) that can be used to predict obesity and to test the relevance of using that alternative data in predicting obesity by evaluating the accuracy performance measurement of the prediction through the use of data mining technology. This study predicts the percentage of macronutrients variables that could lead to obesity in individuals. To simplify the prediction model, the dataset variables were filtered using the automated feature selection method in the WEKA machine learning tool version 3.8. The objective of the feature selection performance of variables from the dataset was to identify the nutrition variables that have the most significant impact on developing accurate prediction models by evaluating the accuracy performance of the model using area under curve score (AUC). The generated nutrition dataset was subjected to the subset method known as correlation-based-feature-selection (CFS) and wrapper methods that included a learning algorithm in the attribute selection process. Several subsets were extracted during the feature selection phase, which served as potential input datasets (predictor) for developing obesity prediction models using different classification algorithms. Based on the feature selection evaluation conducted in this study, the CFS method was found to be the best feature selection method compared to the three wrapper methods conducted, which resulted in the selection of calorie_intake and foodpyramid_level3% variables as the appropriate predictors for this study. These results can enhance the reliability of using household grocery data to predict obesity and open new avenues for research into nutrition and health prediction. © 2025, Semarak Ilmu Publishing. All rights reserved.
Semarak Ilmu Publishing
24621943
English
Article
All Open Access; Hybrid Gold Open Access
author Daud N.; Noordin N.; Lokman A.
spellingShingle Daud N.; Noordin N.; Lokman A.
Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
author_facet Daud N.; Noordin N.; Lokman A.
author_sort Daud N.; Noordin N.; Lokman A.
title Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
title_short Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
title_full Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
title_fullStr Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
title_full_unstemmed Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
title_sort Obesity Predictor Identification: Comparison of Correlation Based Feature Selection Method and Wrapper Method on Nutrition Dataset
publishDate 2025
container_title Journal of Advanced Research in Applied Sciences and Engineering Technology
container_volume 49
container_issue 1
doi_str_mv 10.37934/araset.49.1.129138
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85201563120&doi=10.37934%2faraset.49.1.129138&partnerID=40&md5=ea5bb59727260e9dad351a6b37e9fb6a
description The prevalence of obesity among Malaysians is estimated by calculating the obesity prevalence percentage using BMI prevalence data from the national health morbidity survey (NHMS). However, the nutrition data from the NHMS has not been used to predict the national obesity prevalence as it was collected solely for the documentation of an analysis report on the food consumption patterns of the base population. To address this gap, this study utilises nutrition data by employing 15 nutrition variables derived from grocery data to predict obesity. This paper seeks to identify the appropriate nutrition variable, which involved exploring 8238 rows of raw grocery data (grocery receipt) collected from 35 households. During the data pre-processing phase, 15 nutrition variables were generated in the data conversion and data transformation phase of the data pre-processing phase of this study. This study predicts the percentage of selected nutrition variables that could lead to obesity in individuals. The purpose of this study is to find alternative data (grocery data) that can be used to predict obesity and to test the relevance of using that alternative data in predicting obesity by evaluating the accuracy performance measurement of the prediction through the use of data mining technology. This study predicts the percentage of macronutrients variables that could lead to obesity in individuals. To simplify the prediction model, the dataset variables were filtered using the automated feature selection method in the WEKA machine learning tool version 3.8. The objective of the feature selection performance of variables from the dataset was to identify the nutrition variables that have the most significant impact on developing accurate prediction models by evaluating the accuracy performance of the model using area under curve score (AUC). The generated nutrition dataset was subjected to the subset method known as correlation-based-feature-selection (CFS) and wrapper methods that included a learning algorithm in the attribute selection process. Several subsets were extracted during the feature selection phase, which served as potential input datasets (predictor) for developing obesity prediction models using different classification algorithms. Based on the feature selection evaluation conducted in this study, the CFS method was found to be the best feature selection method compared to the three wrapper methods conducted, which resulted in the selection of calorie_intake and foodpyramid_level3% variables as the appropriate predictors for this study. These results can enhance the reliability of using household grocery data to predict obesity and open new avenues for research into nutrition and health prediction. © 2025, Semarak Ilmu Publishing. All rights reserved.
publisher Semarak Ilmu Publishing
issn 24621943
language English
format Article
accesstype All Open Access; Hybrid Gold Open Access
record_format scopus
collection Scopus
_version_ 1814778497014431744