Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.
We present here two algorithms based on Bayesian modelling of the MAB, that are proved to be optimal in term of frequentist regret. The first, Bayes-UCB uses a Bayesian Upper Confidence Bound based on quantiles of the posterior distribution as an index. The second is Thompson Sampling, a randomized Bayesian algorithm dating back to 1933 whose first finite-time regret analysis was proposed at COLT (2012) and for which we now have an optimal finite-time analysis.
Questions and AnswersYou need to be logged in to be able to post here.