9I制作厂免费

Event

Romain Laroche, researcher from MS Maluuba

Friday, December 1, 2017 10:30to11:30
Room AA6214, Pav. Andr茅-Aisenstadt, CA

Safe Policy Improvement with Baseline Bootstrapping

A common goal in Reinforcement Learning is to derive a good strategy given a limited batch of data. In this paper, we propose a new strategy to compute a safe policy, guaranteed to perform at least as well as a given baseline strategy. We advocate that the assumptions made in previous work are too strong for real world applications and propose new algorithms allowing those assumptions to be satisfied only in a subset of the state-action pairs. While significantly relaxing the assumptions, our algorithms achieve the same accuracy guarantees than the previous work, and are also much more computationally efficient. We also show that the algorithms can be adapted to model-free Reinforcement Learning.听

Follow us on

Back to top