Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies - Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN...
Published in: | ENERGY REPORTS |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Published: |
ELSEVIER
2025
|
Subjects: | |
Online Access: | https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001386454100001 |
author |
Azmi Putri Azmira R.; Yusoff Marina; Sallehud-din Mohamad Taufik Mohd |
---|---|
spellingShingle |
Azmi Putri Azmira R.; Yusoff Marina; Sallehud-din Mohamad Taufik Mohd Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning Energy & Fuels |
author_facet |
Azmi Putri Azmira R.; Yusoff Marina; Sallehud-din Mohamad Taufik Mohd |
author_sort |
Azmi |
spelling |
Azmi, Putri Azmira R.; Yusoff, Marina; Sallehud-din, Mohamad Taufik Mohd Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning ENERGY REPORTS English Article This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies - Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN), NearMiss (NM), Random Over-Sampling (ROS), and ADASYN - were applied to balance the dataset and improve classification outcomes. The dataset includes key gas concentrations (H2, CH4, C2H6, C2H4, and C2H2) and a target defect variable (act). Three machine learning algorithms - Support Vector Machine, Decision Tree, and Random Forest - were tested, with results showing that ENN combined with SVM achieved the highest classification performance: 88% accuracy, 89.89% precision, 88.00% recall, 86.64% F1-score, and a runtime of 0.21 s. This approach demonstrates the effectiveness of data-level techniques in improving transformer fault diagnosis, offering a robust path forward for enhancing electrical power system reliability. Future research should refine these techniques and explore their integration with optimized models to enhance the accuracy of the proposed technique. ELSEVIER 2352-4847 2025 13 10.1016/j.egyr.2024.12.006 Energy & Fuels WOS:001386454100001 https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001386454100001 |
title |
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
title_short |
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
title_full |
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
title_fullStr |
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
title_full_unstemmed |
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
title_sort |
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
container_title |
ENERGY REPORTS |
language |
English |
format |
Article |
description |
This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies - Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN), NearMiss (NM), Random Over-Sampling (ROS), and ADASYN - were applied to balance the dataset and improve classification outcomes. The dataset includes key gas concentrations (H2, CH4, C2H6, C2H4, and C2H2) and a target defect variable (act). Three machine learning algorithms - Support Vector Machine, Decision Tree, and Random Forest - were tested, with results showing that ENN combined with SVM achieved the highest classification performance: 88% accuracy, 89.89% precision, 88.00% recall, 86.64% F1-score, and a runtime of 0.21 s. This approach demonstrates the effectiveness of data-level techniques in improving transformer fault diagnosis, offering a robust path forward for enhancing electrical power system reliability. Future research should refine these techniques and explore their integration with optimized models to enhance the accuracy of the proposed technique. |
publisher |
ELSEVIER |
issn |
2352-4847 |
publishDate |
2025 |
container_volume |
13 |
container_issue |
|
doi_str_mv |
10.1016/j.egyr.2024.12.006 |
topic |
Energy & Fuels |
topic_facet |
Energy & Fuels |
accesstype |
|
id |
WOS:001386454100001 |
url |
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001386454100001 |
record_format |
wos |
collection |
Web of Science (WoS) |
_version_ |
1823296087088168960 |