Deep Reinforcement Learning-based resource allocation strategy for Energy Harvesting-Powered Cognitive Machine-to-Machine Networks

Machine-to-Machine (M2M) communication is a promising technology that may realize the Internet of Things (IoTs) in future networks. However, due to the features of massive devices and concurrent access requirement, it will cause performance degradation and enormous energy consumption. Energy Harvest...

Full description

Bibliographic Details
Published in:Computer Communications
Main Author: Xu Y.-H.; Tian Y.-B.; Searyoh P.K.; Yu G.; Yong Y.-T.
Format: Article
Language:English
Published: Elsevier B.V. 2020
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85087934968&doi=10.1016%2fj.comcom.2020.07.015&partnerID=40&md5=0faa4b65f4ccb305b9ad618b74505267
Description
Summary:Machine-to-Machine (M2M) communication is a promising technology that may realize the Internet of Things (IoTs) in future networks. However, due to the features of massive devices and concurrent access requirement, it will cause performance degradation and enormous energy consumption. Energy Harvesting-Powered Cognitive M2M Networks (EH-CMNs) as an attractive solution is capable of alleviating the escalating spectrum deficient to guarantee the Quality of Service (QoS) meanwhile decreasing the energy consumption to achieve Green Communication (GC) became an important research topic. In this paper, we investigate the resource allocation problem for EH-CMNs underlaying cellular uplinks. We aim to maximize the energy efficiency of EH-CMNs with consideration of the QoS of Human-to-Human (H2H) networks and the available energy in EH-devices. In view of the characteristic of EH-CMNs, we formulate the problem to be a decentralized Discrete-time and Finite-state Markov Decision Process (DFMDP), in which each device acts as agent and effectively learns from the environment to make allocation decision without the complete and global network information. Owing to the complexity of the problem, we propose a Deep Reinforcement Learning (DRL)-based algorithm to solve the problem. Numerical results validate that the proposed scheme outperforms other schemes in terms of average energy efficiency with an acceptable convergence speed. © 2020 Elsevier B.V.
ISSN:1403664
DOI:10.1016/j.comcom.2020.07.015