FAQ
logo of Jagiellonian University in Krakow

Accidental exploration through value predictors

Publication date: 2018

Schedae Informaticae, 2018, Volume 27, pp. 107 - 127

https://doi.org/10.4467/20838476SI.18.009.10414

Authors

,
Tomasz Kisielewski
Faculty of Mathematics and Computer Science, Jagiellonian University, Krakow, Poland
All publications →
Damian Leśniak
Faculty of Mathematics and Computer Science, Jagiellonian University, Krakow, Poland
All publications →

Titles

Accidental exploration through value predictors

Abstract

Infinite length of trajectories is an almost universal assumption in the theoretical foundations of reinforcement learning. In practice learning occurs on finite trajectories. In this paper we examine a specific result of this disparity, namely a strong bias of the time-bounded Every-visit Monte Carlo value estimator. This manifests as a vastly different learning dynamic for algorithms that use value predictors, including encouraging or discouraging exploration. We investigate these claims theoretically for a one dimensional random walk, and empirically on a number of simple environments. We use GAE as an algorithm involving a value predictor and evolution strategies as a reference point.

References


Information

Information: Schedae Informaticae, 2018, Volume 27, pp. 107 - 127

Article type: Original article

Titles:

Polish:

Accidental exploration through value predictors

English:

Accidental exploration through value predictors

Authors

Faculty of Mathematics and Computer Science, Jagiellonian University, Krakow, Poland

Faculty of Mathematics and Computer Science, Jagiellonian University, Krakow, Poland

Published at: 2018

Article status: Open

Licence: CC BY-NC-ND  licence icon

Percentage share of authors:

Tomasz Kisielewski (Author) - 50%
Damian Leśniak (Author) - 50%

Article corrections:

-

Publication languages:

English

View count: 1337

Number of downloads: 0

<p> Accidental exploration through value predictors</p>