COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION
Recently, various Deep Actor -Critic Reinforcement Learning (DAC-RL) algorithms have been widely utilized for training mobile robots in acquiring navigational policies. However, they usually need a preventively long learning time to achieve good policies. This research proposes a two -stage training...
Published in: | INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Published: |
ICIC INT
2024
|
Subjects: | |
Online Access: | https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001204092500009 |
author |
Sendari Siti; Muladi; Ardiyansyah Firman; Setumin Samsul; Mokhtar Norrima Binti; Lin Hsien- I; Hartono Pitoyo |
---|---|
spellingShingle |
Sendari Siti; Muladi; Ardiyansyah Firman; Setumin Samsul; Mokhtar Norrima Binti; Lin Hsien- I; Hartono Pitoyo COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION Computer Science |
author_facet |
Sendari Siti; Muladi; Ardiyansyah Firman; Setumin Samsul; Mokhtar Norrima Binti; Lin Hsien- I; Hartono Pitoyo |
author_sort |
Sendari |
spelling |
Sendari, Siti; Muladi; Ardiyansyah, Firman; Setumin, Samsul; Mokhtar, Norrima Binti; Lin, Hsien-, I; Hartono, Pitoyo COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL English Article Recently, various Deep Actor -Critic Reinforcement Learning (DAC-RL) algorithms have been widely utilized for training mobile robots in acquiring navigational policies. However, they usually need a preventively long learning time to achieve good policies. This research proposes a two -stage training mechanism infused with human common-sensical prior knowledge, named Two Stages DAC-RL with incentive reward, to alleviate this problem. The actor -critic networks were pre -trained in a simple environment to acquire a basic policy. Afterward, the basic policy was transferred to initialize the training process of a new navigational policy in more complex environments. This study also infused humans' common-sensical prior knowledge to further mitigate the RL learning burden by giving incentive rewards in beneficial situations for the navigation task. The experiments tested this research's algorithms against navigation tasks in which the robot should efficiently reach designated goals. The tasks were made more challenging by requiring the robot to cross some corridors to reach the goal while avoiding obstacles. The results showed that the proposed algorithm worked efficiently regarding various start -goal positions across the corridors. ICIC INT 1349-4198 1349-418X 2024 20 2 10.24507/ijicic.20.02.373 Computer Science WOS:001204092500009 https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001204092500009 |
title |
COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION |
title_short |
COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION |
title_full |
COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION |
title_fullStr |
COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION |
title_full_unstemmed |
COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION |
title_sort |
COMMON-SENSICAL INCENTIVE REWARD IN DEEP ACTOR-CRITIC REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION |
container_title |
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL |
language |
English |
format |
Article |
description |
Recently, various Deep Actor -Critic Reinforcement Learning (DAC-RL) algorithms have been widely utilized for training mobile robots in acquiring navigational policies. However, they usually need a preventively long learning time to achieve good policies. This research proposes a two -stage training mechanism infused with human common-sensical prior knowledge, named Two Stages DAC-RL with incentive reward, to alleviate this problem. The actor -critic networks were pre -trained in a simple environment to acquire a basic policy. Afterward, the basic policy was transferred to initialize the training process of a new navigational policy in more complex environments. This study also infused humans' common-sensical prior knowledge to further mitigate the RL learning burden by giving incentive rewards in beneficial situations for the navigation task. The experiments tested this research's algorithms against navigation tasks in which the robot should efficiently reach designated goals. The tasks were made more challenging by requiring the robot to cross some corridors to reach the goal while avoiding obstacles. The results showed that the proposed algorithm worked efficiently regarding various start -goal positions across the corridors. |
publisher |
ICIC INT |
issn |
1349-4198 1349-418X |
publishDate |
2024 |
container_volume |
20 |
container_issue |
2 |
doi_str_mv |
10.24507/ijicic.20.02.373 |
topic |
Computer Science |
topic_facet |
Computer Science |
accesstype |
|
id |
WOS:001204092500009 |
url |
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001204092500009 |
record_format |
wos |
collection |
Web of Science (WoS) |
_version_ |
1809678908005548032 |