Multi-robot cooperative planning by consensus Q-learning


Multi-robot cooperation entails planning by multiple robots for a common objective, where each robot/agent actuates upon the environment-based on the sensory information received from the environment. Multi-robot cooperation employing equilibrium-based reinforcement learning is optimal in the sense of system resource (time and/or energy) utilization, because of the prior adaption of the environment by the robots. Unfortunately, robots cannot enjoy such benefit of reinforcement learning in presence of multiple types of equilibria (here Nash equilibrium or correlated equilibrium). In the above perspective, robots need to adapt with a strategy, so that robots can select the optimal equilibrium in each step of the learning. The paper proposes consensus-based multi-agent Q-learning to address the bottleneck of the optimal equilibrium selection among multiple types. An analysis reveals that a consensus (joint action) is coordination type pure strategy Nash equilibrium as well as pure strategy correlated equilibrium. The superiority of the proposed consensus-based multi-agent Q-learning algorithm over the traditional reference algorithms in terms of the average reward collection is shown in the experimental section. In addition, the proposed consensus-based planning algorithm is also verified considering multi-robot stick-carrying problem as a benchmark.

Publication Title

Proceedings of the International Joint Conference on Neural Networks