Research output per year
Research output per year
Research output: Contribution to journal › Article › Research › peer-review
We give a simple optimistic algorithm for which it is easy to derive regret bounds of Õ(√t mixSAT) after T steps in uniformly ergodic Markov decision processes with S states, A actions, and mixing time parameter t mix. These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.
| Original language | English |
|---|---|
| Pages (from-to) | 115-128 |
| Number of pages | 14 |
| Journal | The journal of artificial intelligence research |
| Volume | 67.2020 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 23 Jan 2020 |
Research output: Contribution to conference › Poster › Research › peer-review
Ortner, R. (Speaker)
Activity: Talk or presentation › Oral presentation