Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning

This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies – Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN...

Full description

Bibliographic Details
Published in:Energy Reports
Main Author: Azmi P.A.R.; Yusoff M.; Sallehud-din M.T.M.
Format: Article
Language:English
Published: Elsevier Ltd 2025
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85211464729&doi=10.1016%2fj.egyr.2024.12.006&partnerID=40&md5=a2ce560b66c09b702e817d88325bd90e
Description
Summary:This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies – Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN), NearMiss (NM), Random Over-Sampling (ROS), and ADASYN – were applied to balance the dataset and improve classification outcomes. The dataset includes key gas concentrations (H2, CH4, C2H6, C2H4, and C2H2) and a target defect variable (act). Three machine learning algorithms – Support Vector Machine, Decision Tree, and Random Forest – were tested, with results showing that ENN combined with SVM achieved the highest classification performance: 88% accuracy, 89.89% precision, 88.00% recall, 86.64% F1-score, and a runtime of 0.21 s. This approach demonstrates the effectiveness of data-level techniques in improving transformer fault diagnosis, offering a robust path forward for enhancing electrical power system reliability. Future research should refine these techniques and explore their integration with optimized models to enhance the accuracy of the proposed technique. © 2024 The Author(s)
ISSN:23524847
DOI:10.1016/j.egyr.2024.12.006