ADAPTIVE REWARD SHAPING METHOD UNDER DYNAMIC OBJECT UNCERTAINTY
DOI: 10.31673/2412-4338.2026.019003
Abstract
The study proposes and substantiates a method of adaptive reward formation for the navigation of autonomous mobile robots in complex dynamic social environments, where the presence of people creates a high level of uncertainty in the socio-dynamic environment. The relevance of the study is determined by the need for the safe integration of autonomous mobile robots into human space, where they must act not only effectively but also in a socially acceptable manner.
The disadvantage of existing approaches based on deep reinforcement learning (DRL) is the use of reward functions with fixed weight coefficients. This approach does not allow the robot to adapt flexibly to changes in the environment: focusing on achieving the goal leads to an increased risk of collisions, while prioritising safety often causes the problem of a ‘frozen robot’ and overly conservative behaviour, which reduces the overall efficiency of the system.
The proposed approach solves this problem by integrating the proximal policy optimisation (PPO) architecture with a probabilistic trajectory prediction module. The probabilistic prediction module is based on a recurrent LSTM neural network, which encodes the temporal dependencies of agent movements, and a mixture of density networks (MDN), which allows modelling the multimodality of human behaviour. The output layer of the MDN directly generates the parameters of the mixture of normal distributions.
The proposed mechanism of dynamic adaptive weighting of reward function components. The system automatically adjusts the balance between competing goals: in situations with high uncertainty in predicting the behaviour of human agents, the weighting coefficients of safety and social comfort increase non-linearly, forcing the agent to act more cautiously. Conversely, when the intentions of human agents are predictable, the system increases the priority of movement efficiency.
Experimental validation of the method confirmed the effectiveness of the proposed architecture.
Keywords: information technology, machine learning methods, reinforcement learning methods, autonomous mobile robots, mobile robot navigation.
References
- Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., & Alahi, A. (2018). Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. У 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2018.00240
- Xiang, W., YIN, H., Wang, H., & Jin, X. (2024). SocialCVAE: Predicting Pedestrian Trajectory via Interaction Conditioned Latents. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 6216–6224. https://doi.org/10.1609/aaai.v38i6.28439
- Kosaraju, V., Sadeghian, A., Martín-Martín, R., Reid, I., Rezatofighi, H., & Savarese, S. (2019). Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. Advances in neural information processing systems, 32. https://proceedings.neurips.cc/paper/2019/file/d09bf41544a3365a46c9077ebb5e35c3-Paper.pdf
- Li, K., Xu, Y., Wang, J., & Meng, M. Q. H. (2019). SARL∗: Deep Reinforcement Learning based Human-Aware Navigation for Mobile Robot in Indoor Environments. У 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE. https://doi.org/10.1109/robio49542.2019.8961764
- Cao, M., Xu, X., Yang, Y., Li, J., Jin, T., Wang, P., Hung, T.-Y., Lin, G., & Xie, L. (2025). Learning Dynamic Weight Adjustment for Spatial-Temporal Trajectory Planning in Crowd Navigation. У 2025 IEEE International Conference on Robotics and Automation (ICRA) (с. 8196–8202). IEEE. https://doi.org/10.1109/icra55743.2025.11128766
- He, J., Zhao, D., Liu, T., Zou, Q., & Xie, J. (2025). Research on Adaptive Reward Optimization Method for Robot Navigation in Complex Dynamic Environment. Computers, Materials & Continua, 1–10. https://doi.org/10.32604/cmc.2025.065205
- Alshammari, A. B. (2025). Dynamic Rewards in Reinforcement Learning for Robotic Navigation. Engineering, Technology & Applied Science Research, 15(4), 25766–25771. https://doi.org/10.48084/etasr.11986
- Choi, S., Lee, K., Lim, S., & Oh, S. (2018). Uncertainty-aware learning from demonstration using mixture density networks with sampling-free variance modeling. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 6915–6922.
https://doi.org/10.48550/arXiv.1709.02249 - Hanenko, L., & Zhebka, V. (2025). Model of social-adaptive navigation of mobile robot using reinforcement learning methods. Electronic professional scientific publication "Cybersecurity: education, science, technology" , 1 (29), 559–570. https://doi.org/10.28925/2663-4023.2025.29.907
- Hanenko, L., & Bushma, O. (2025). Method of training autonomous mobile robots based on drl and curriculum learning. Electronic professional scientific publication "Cybersecurity: education, science, technology", 2 (30), 568–582. https://doi.org/10.28925/2663-4023.2025.30.994