Theoretical learning goal selection for non-communicative multi-agent cooperation

Fumito Uwano, Keiki Takadama

Research output: Contribution to journalArticlepeer-review


This paper extended PMRL as the non-communicative and theoretical method for two agents, and proposed PLA as the method to be able to force agents to learn cooperative behavior for any number of agents. In addition, this paper adds the theoretic explanation for PLA that all agents achieve all purposes without spending the largest times. Concretely PLA forces each agent to avoid the more difficult purposes requiring many time to be reached by limiting the purpose which it can achieve, and it forces the agents to learn cooperative policy as achieving the appropriate purpose among the limited purposes. The experimental results in this paper derive that (1) PLA enables the agents to learn cooperative policy in the two grid world problems for three and five agents, and (2) PLA can force all agents to achieve all purposes in the problems with the minimum time.

Original languageEnglish
Pages (from-to)75-84
Number of pages10
JournalIEEJ Transactions on Electronics, Information and Systems
Issue number1
Publication statusPublished - 2020
Externally publishedYes


  • Multi-agent system
  • Reinforcement learning
  • Reward management

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this