Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Thomas Michel, Hossein Hajiabolhassan, Ronald Ortner

Research output: Contribution to journalArticleResearchpeer-review

Abstract

This paper considers the objective of \textit{satisficing} in multi-armed bandit problems. Instead of aiming to find an optimal arm, the learner is content with an arm whose reward is above a given satisfaction level. We provide algorithms and analysis for the realizable case when such a satisficing arm exists as well as for the general case when this may not be the case. Introducing the notion of \textit{satisficing regret}, our main result shows that in the general case it is possible to obtain constant satisficing regret when there is a satisficing arm (thereby correcting a contrary claim in the literature), while standard logarithmic regret bounds can be re-established otherwise. Experiments illustrate that our algorithm is not only superior to standard algorithms in the satisficing setting, but also works well in the classic bandit setting.
Original languageEnglish
Number of pages19
JournalTransactions on machine learning research
Volume2023
Issue numberAugust
Publication statusPublished - 7 Jun 2023

Cite this