Nonconvergence to saddle boundary points under perturbed reinforcement learning

G. Chasparis, J. Shamma, A. Rantzer. Nonconvergence to saddle boundary points under perturbed reinforcement learning. International Journal of Game Theory, volume 44, number 3, pages 667-699, DOI 10.1007/s00182-014-0449-3, 8, 2015.

Autoren
  • Georgios Chasparis
  • Jeff Shamma
  • Anders Rantzer
TypArtikel
JournalInternational Journal of Game Theory
Nummer3
Band44
DOI10.1007/s00182-014-0449-3
ISSN0020-7276 (Print) 1432-1270 (Online)
Monat8
Jahr2015
Seiten667-699
Abstract

For several reinforcement learning models in strategic-form games, convergence to action profiles that are not Nash equilibria may occur with positive probability under certain conditions on the payoff function. In this paper, we explore how an alternative reinforcement learning model, where the strategy of each agent is perturbed by a strategy-dependent perturbation (or mutations) function, may exclude convergence to non-Nash pure strategy profiles. This approach extends prior analysis on reinforcement learning in games that addresses the issue of convergence to saddle boundary points. It further provides a framework under which the effect of mutations can be analyzed in the context of reinforcement learning.