МЕТОД АДАПТИВНОГО ФОРМУВАННЯ ВИНАГОРОДИ ЗА УМОВ НЕВИЗНАЧЕНОСТІ ДИНАМІЧНИХ ОБ’ЄКТІВ: DOI: 10.31673/2412-4338.2026.019003

Людмила Дмитрівна Ганенко

Authors

Людмила Дмитрівна Ганенко, (Hanenko Liudmyla) State University of Information and Communication Technologies, Kyiv https://orcid.org/0000-0003-2219-8196

Abstract

The study proposes and substantiates a method of adaptive reward formation for the navigation of autonomous mobile robots in complex dynamic social environments, where the presence of people creates a high level of uncertainty in the socio-dynamic environment. The relevance of the study is determined by the need for the safe integration of autonomous mobile robots into human space, where they must act not only effectively but also in a socially acceptable manner.

The disadvantage of existing approaches based on deep reinforcement learning (DRL) is the use of reward functions with fixed weight coefficients. This approach does not allow the robot to adapt flexibly to changes in the environment: focusing on achieving the goal leads to an increased risk of collisions, while prioritising safety often causes the problem of a ‘frozen robot’ and overly conservative behaviour, which reduces the overall efficiency of the system.

The proposed approach solves this problem by integrating the proximal policy optimisation (PPO) architecture with a probabilistic trajectory prediction module. The probabilistic prediction module is based on a recurrent LSTM neural network, which encodes the temporal dependencies of agent movements, and a mixture of density networks (MDN), which allows modelling the multimodality of human behaviour. The output layer of the MDN directly generates the parameters of the mixture of normal distributions.

The proposed mechanism of dynamic adaptive weighting of reward function components. The system automatically adjusts the balance between competing goals: in situations with high uncertainty in predicting the behaviour of human agents, the weighting coefficients of safety and social comfort increase non-linearly, forcing the agent to act more cautiously. Conversely, when the intentions of human agents are predictable, the system increases the priority of movement efficiency.

Experimental validation of the method confirmed the effectiveness of the proposed architecture.

Keywords: information technology, machine learning methods, reinforcement learning methods, autonomous mobile robots, mobile robot navigation.

References

Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., & Alahi, A. (2018). Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. У 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2018.00240
Xiang, W., YIN, H., Wang, H., & Jin, X. (2024). SocialCVAE: Predicting Pedestrian Trajectory via Interaction Conditioned Latents. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 6216–6224. https://doi.org/10.1609/aaai.v38i6.28439
Kosaraju, V., Sadeghian, A., Martín-Martín, R., Reid, I., Rezatofighi, H., & Savarese, S. (2019). Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. Advances in neural information processing systems, 32. https://proceedings.neurips.cc/paper/2019/file/d09bf41544a3365a46c9077ebb5e35c3-Paper.pdf
Li, K., Xu, Y., Wang, J., & Meng, M. Q. H. (2019). SARL∗: Deep Reinforcement Learning based Human-Aware Navigation for Mobile Robot in Indoor Environments. У 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE. https://doi.org/10.1109/robio49542.2019.8961764
Cao, M., Xu, X., Yang, Y., Li, J., Jin, T., Wang, P., Hung, T.-Y., Lin, G., & Xie, L. (2025). Learning Dynamic Weight Adjustment for Spatial-Temporal Trajectory Planning in Crowd Navigation. У 2025 IEEE International Conference on Robotics and Automation (ICRA) (с. 8196–8202). IEEE. https://doi.org/10.1109/icra55743.2025.11128766
He, J., Zhao, D., Liu, T., Zou, Q., & Xie, J. (2025). Research on Adaptive Reward Optimization Method for Robot Navigation in Complex Dynamic Environment. Computers, Materials & Continua, 1–10. https://doi.org/10.32604/cmc.2025.065205
Alshammari, A. B. (2025). Dynamic Rewards in Reinforcement Learning for Robotic Navigation. Engineering, Technology & Applied Science Research, 15(4), 25766–25771. https://doi.org/10.48084/etasr.11986
Choi, S., Lee, K., Lim, S., & Oh, S. (2018). Uncertainty-aware learning from demonstration using mixture density networks with sampling-free variance modeling. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 6915–6922.
https://doi.org/10.48550/arXiv.1709.02249
Hanenko, L., & Zhebka, V. (2025). Model of social-adaptive navigation of mobile robot using reinforcement learning methods. Electronic professional scientific publication "Cybersecurity: education, science, technology" , 1 (29), 559–570. https://doi.org/10.28925/2663-4023.2025.29.907
Hanenko, L., & Bushma, O. (2025). Method of training autonomous mobile robots based on drl and curriculum learning. Electronic professional scientific publication "Cybersecurity: education, science, technology", 2 (30), 568–582. https://doi.org/10.28925/2663-4023.2025.30.994

ADAPTIVE REWARD SHAPING METHOD UNDER DYNAMIC OBJECT UNCERTAINTY

DOI: 10.31673/2412-4338.2026.019003

Authors

Abstract

Downloads

Published

Issue

Section

Developed By

Language

Make a Submission