Abstract
In interactive problems an agent is often tasked with making decisions only based on previous observation as obtaining any training data is either hard or not possible. In such cases Reinforcement Learning (a field of Machine Learning along with Supervised and Unsupervised Learning) offer ways to solve such problems in multiple ways. Generally there is the consideration to be made if finding an optimal solution is needed in the case at hand or if perhaps something can be gained by accepting "good enough" solutions (known as Satisficing). We evaluate the SAT-RL and SAT-UCRL algorithms by comparing them to the established UCRL2 algorithm. The former two algorithms representing the idea that "good enough" solutions are acceptable and the latter aiming for an optimal solution. For this we craft various Markov Decision Processes (MDP) based on a storage alloca- tion problem. We tested a total of 8 MDPs grouped into 4 types and measured the classical as well as the σ regret. During these tests we especially focused on the effect of different satisfaction levels for the SAT-RL and SAT-UCRL algorithms. The results varied greatly depending on the type of MDP tested. For some types of MDPs the SAT-RL and SAT-UCRL algorithms showed an improvement over UCRL2. However in the cases where we crafted specific MPDs UCRL2 did considerably better, while other types favour UCRL2. In general SAT-UCRL faired better than SAT-RL, in cases where UCRL2 faired better as well, as one would expect. Especially in cases where SAT-RL had difficulties, such as those where ¿ is close to or above the optimal per-step reward, SAT-UCRL mitigated the high regret of the exploration policy of SAT-RL. SAT-UCRL offered a good solution as it doesn't run a pure exploration episode but also considers previously gained information.
| Translated title of the contribution | Reinforcement Learning zur Satisfizierung eines Lagerhalteproblems |
|---|---|
| Original language | English |
| Qualification | Dipl.-Ing. |
| Awarding Institution |
|
| Supervisors/Advisors |
|
| Award date | 27 Jun 2025 |
| Publication status | Published - 2025 |
Bibliographical note
no embargoKeywords
- Reinforcment Learning
- SAT-RL
- UCRL
- UCRL2
- SAT-UCRL
- small storage space