Bootstrapping Simulation in Determining the Prognostic Factors of Lung Cancer Disease by Parametric Survival Analysis

Big data analytics focuses on getting useful insights, trends and pattern out of complex and large data. Increasing the sample by resampling the data, in biostatistics expertise, can be employed using the bootstrapping techniques. The world of bootstrapping is very large and expanding where it does...

Full description

Bibliographic Details
Published in:2023 IEEE International Conference on Computing, ICOCO 2023
Main Author: Muhamad Jamil S.A.; Affendi Abdullah M.A.; Ibrahim N.; Mansor M.M.; Md Ghani N.A.
Format: Conference paper
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2023
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85184855389&doi=10.1109%2fICOCO59262.2023.10398056&partnerID=40&md5=1dccb55ed27db78ad16425fa619a0bd2
Description
Summary:Big data analytics focuses on getting useful insights, trends and pattern out of complex and large data. Increasing the sample by resampling the data, in biostatistics expertise, can be employed using the bootstrapping techniques. The world of bootstrapping is very large and expanding where it does not only compute the confidence interval but also perform a standard resampling method. Nevertheless, survival analysis study mostly allows the data to be not normally distributed because of the censored observations. Small number of samples also one of the reasons why this study has to perform bootstrapping to overcome the issues of biasness. Bootstrapping method is said to be one of the best methods in handling skewed data. Thus, by considering bootstrapping method, this study aims to find the most significant prognostic factors of lung cancer disease that affect the survival times with the presence of censored observations by using the parametric survival analysis. Therefore, based on 100, 150, 250 and 600 number of sampling sizes, exponential distribution appeared to fit all the assigned sample sizes. Weibull and log-logistic distribution seems to fit the data only for 100 number of samples. Races and two of the interaction terms in the model appeared to be the most significant prognostic factors affecting the survival time of lung cancer. © 2023 IEEE.
ISSN:
DOI:10.1109/ICOCO59262.2023.10398056