COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION

Recently, various Deep Actor -Critic Reinforcement Learning (DAC-RL) algorithms have been widely utilized for training mobile robots in acquiring navigational policies. However, they usually need a preventively long learning time to achieve good policies. This research proposes a two -stage training...

Full description

Bibliographic Details
Published in:INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL
Main Authors: Sendari, Siti; Muladi; Ardiyansyah, Firman; Setumin, Samsul; Mokhtar, Norrima Binti; Lin, Hsien-, I; Hartono, Pitoyo
Format: Article
Language:English
Published: ICIC INT 2024
Subjects:
Online Access:https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001204092500009
author Sendari
Siti; Muladi; Ardiyansyah
Firman; Setumin
Samsul; Mokhtar
Norrima Binti; Lin
Hsien-
I; Hartono
Pitoyo
spellingShingle Sendari
Siti; Muladi; Ardiyansyah
Firman; Setumin
Samsul; Mokhtar
Norrima Binti; Lin
Hsien-
I; Hartono
Pitoyo
COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION
Computer Science
author_facet Sendari
Siti; Muladi; Ardiyansyah
Firman; Setumin
Samsul; Mokhtar
Norrima Binti; Lin
Hsien-
I; Hartono
Pitoyo
author_sort Sendari
spelling Sendari, Siti; Muladi; Ardiyansyah, Firman; Setumin, Samsul; Mokhtar, Norrima Binti; Lin, Hsien-, I; Hartono, Pitoyo
COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL
English
Article
Recently, various Deep Actor -Critic Reinforcement Learning (DAC-RL) algorithms have been widely utilized for training mobile robots in acquiring navigational policies. However, they usually need a preventively long learning time to achieve good policies. This research proposes a two -stage training mechanism infused with human common-sensical prior knowledge, named Two Stages DAC-RL with incentive reward, to alleviate this problem. The actor -critic networks were pre -trained in a simple environment to acquire a basic policy. Afterward, the basic policy was transferred to initialize the training process of a new navigational policy in more complex environments. This study also infused humans' common-sensical prior knowledge to further mitigate the RL learning burden by giving incentive rewards in beneficial situations for the navigation task. The experiments tested this research's algorithms against navigation tasks in which the robot should efficiently reach designated goals. The tasks were made more challenging by requiring the robot to cross some corridors to reach the goal while avoiding obstacles. The results showed that the proposed algorithm worked efficiently regarding various start -goal positions across the corridors.
ICIC INT
1349-4198
1349-418X
2024
20
2
10.24507/ijicic.20.02.373
Computer Science

WOS:001204092500009
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001204092500009
title COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION
title_short COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION
title_full COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION
title_fullStr COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION
title_full_unstemmed COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION
title_sort COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION
container_title INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL
language English
format Article
description Recently, various Deep Actor -Critic Reinforcement Learning (DAC-RL) algorithms have been widely utilized for training mobile robots in acquiring navigational policies. However, they usually need a preventively long learning time to achieve good policies. This research proposes a two -stage training mechanism infused with human common-sensical prior knowledge, named Two Stages DAC-RL with incentive reward, to alleviate this problem. The actor -critic networks were pre -trained in a simple environment to acquire a basic policy. Afterward, the basic policy was transferred to initialize the training process of a new navigational policy in more complex environments. This study also infused humans' common-sensical prior knowledge to further mitigate the RL learning burden by giving incentive rewards in beneficial situations for the navigation task. The experiments tested this research's algorithms against navigation tasks in which the robot should efficiently reach designated goals. The tasks were made more challenging by requiring the robot to cross some corridors to reach the goal while avoiding obstacles. The results showed that the proposed algorithm worked efficiently regarding various start -goal positions across the corridors.
publisher ICIC INT
issn 1349-4198
1349-418X
publishDate 2024
container_volume 20
container_issue 2
doi_str_mv 10.24507/ijicic.20.02.373
topic Computer Science
topic_facet Computer Science
accesstype
id WOS:001204092500009
url https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001204092500009
record_format wos
collection Web of Science (WoS)
_version_ 1809678908005548032