Research output per year
Research output per year
Research output: Contribution to journal › Article › Research › peer-review
We give a simple optimistic algorithm for which it is easy to derive regret bounds of Õ(√t mixSAT) after T steps in uniformly ergodic Markov decision processes with S states, A actions, and mixing time parameter t mix. These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.
Original language | English |
---|---|
Pages (from-to) | 115-128 |
Number of pages | 14 |
Journal | The journal of artificial intelligence research |
Volume | 67.2020 |
Issue number | 1 |
DOIs | |
Publication status | Published - 23 Jan 2020 |
Research output: Contribution to conference › Poster › Research › peer-review
Ronald Ortner (Speaker)
Activity: Talk or presentation › Oral presentation