Performance Measurement: Machine Learning as a Complement to DEA for Continuous Efficiency Estimation

Data Envelopment Analysis (DEA) is a well-established non-parametric technique for performance measurement to assess the efficiency of Decision-Making Units (DMUs). However, its inability to predict the efficiency values of new DMUs without re-conducting the analysis on the entire dataset has led to...

Full description

Bibliographic Details
Published in:Malaysian Journal of Fundamental and Applied Sciences
Main Author: Khoubrane Y.; Ramli N.A.; Khairi S.S.M.
Format: Article
Language:English
Published: Penerbit UTM Press 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85191798907&doi=10.11113%2fmjfas.v20n2.3310&partnerID=40&md5=9bcc4c803e975bad93d5d98ef9707cf6
Description
Summary:Data Envelopment Analysis (DEA) is a well-established non-parametric technique for performance measurement to assess the efficiency of Decision-Making Units (DMUs). However, its inability to predict the efficiency values of new DMUs without re-conducting the analysis on the entire dataset has led to the integration of Machine Learning (ML) in previous studies to address this limitation. Yet, such integration often lacks a thorough evaluation of ML's adaptability in replacing the current DEA process. This paper presents the results of an empirical study that employed eight ML models, two DEA variants, and a dataset of S&P500 companies. The findings demonstrated ML’s remarkable precision in predicting efficiency values derived from a single DEA run and comparable performance in predicting the efficiency of new DMUs, thus eliminating the need for repeated DEA. This discovery highlights ML’s robustness as a complementary tool for DEA in continuous efficiency estimation, rendering the practice of re-conducting DEA unnecessary. Notably, boosting models within the Ensemble Learning category consistently outperformed other models, highlighting their effectiveness in the context of DEA efficiency prediction. Particularly, CatBoost demonstrated its superiority as the top-performing model, followed by LightGBM in the second position in most cases. When extended to five enlarged datasets, it shows that the model exhibits superior R2 values in the CRS scenario. ©Copyright Khoubrane.
ISSN:2289599X
DOI:10.11113/mjfas.v20n2.3310