Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning

This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies - Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN...

Full description

Bibliographic Details
Published in:ENERGY REPORTS
Main Authors: Azmi, Putri Azmira R.; Yusoff, Marina; Sallehud-din, Mohamad Taufik Mohd
Format: Article
Language:English
Published: ELSEVIER 2025
Subjects:
Online Access:https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001386454100001
author Azmi
Putri Azmira R.; Yusoff
Marina; Sallehud-din
Mohamad Taufik Mohd
spellingShingle Azmi
Putri Azmira R.; Yusoff
Marina; Sallehud-din
Mohamad Taufik Mohd
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
Energy & Fuels
author_facet Azmi
Putri Azmira R.; Yusoff
Marina; Sallehud-din
Mohamad Taufik Mohd
author_sort Azmi
spelling Azmi, Putri Azmira R.; Yusoff, Marina; Sallehud-din, Mohamad Taufik Mohd
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
ENERGY REPORTS
English
Article
This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies - Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN), NearMiss (NM), Random Over-Sampling (ROS), and ADASYN - were applied to balance the dataset and improve classification outcomes. The dataset includes key gas concentrations (H2, CH4, C2H6, C2H4, and C2H2) and a target defect variable (act). Three machine learning algorithms - Support Vector Machine, Decision Tree, and Random Forest - were tested, with results showing that ENN combined with SVM achieved the highest classification performance: 88% accuracy, 89.89% precision, 88.00% recall, 86.64% F1-score, and a runtime of 0.21 s. This approach demonstrates the effectiveness of data-level techniques in improving transformer fault diagnosis, offering a robust path forward for enhancing electrical power system reliability. Future research should refine these techniques and explore their integration with optimized models to enhance the accuracy of the proposed technique.
ELSEVIER
2352-4847

2025
13

10.1016/j.egyr.2024.12.006
Energy & Fuels

WOS:001386454100001
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001386454100001
title Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
title_short Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
title_full Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
title_fullStr Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
title_full_unstemmed Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
title_sort Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
container_title ENERGY REPORTS
language English
format Article
description This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies - Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN), NearMiss (NM), Random Over-Sampling (ROS), and ADASYN - were applied to balance the dataset and improve classification outcomes. The dataset includes key gas concentrations (H2, CH4, C2H6, C2H4, and C2H2) and a target defect variable (act). Three machine learning algorithms - Support Vector Machine, Decision Tree, and Random Forest - were tested, with results showing that ENN combined with SVM achieved the highest classification performance: 88% accuracy, 89.89% precision, 88.00% recall, 86.64% F1-score, and a runtime of 0.21 s. This approach demonstrates the effectiveness of data-level techniques in improving transformer fault diagnosis, offering a robust path forward for enhancing electrical power system reliability. Future research should refine these techniques and explore their integration with optimized models to enhance the accuracy of the proposed technique.
publisher ELSEVIER
issn 2352-4847

publishDate 2025
container_volume 13
container_issue
doi_str_mv 10.1016/j.egyr.2024.12.006
topic Energy & Fuels
topic_facet Energy & Fuels
accesstype
id WOS:001386454100001
url https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001386454100001
record_format wos
collection Web of Science (WoS)
_version_ 1823296087088168960