Click in the text-area below, and then press Enter key to start playing the video. You will be asked to press Enter again to pause the video and type-in your transcript.


  • [{{time_string(subtitle.start_time)}} - {{time_string(subtitle.end_time)}}] {{subtitle.text}}


This tutorial intends to be an introduction to stochastic and adversarial multi-armed bandit algorithms and to survey some of the recent advances. In the multi-armed bandit problem, at each stage, an agent (or decision maker) chooses one action (or arm), and receives a reward from it. The agent aims at maximizing his rewards. Since he does not know the process generating the rewards, he needs to explore (try) the different actions and yet, exploit (concentrate its draws on) the seemingly most rewarding arms. The bandit problem has been increasingly popular in the machine learning community. It is the simplest setting where one encounters the exploration-exploitation dilemma. It has a wide range of applications including advertizement [1, 6], economics [2, 12], games [7] and optimization [10, 5, 9, 3], model selection and machine learning algorithms itself [13, 4]. It can be a central building block of larger systems, like in evolutionary programming [8] and reinforcement learning [14], in particular in large state space Markovian Decision Problems [11].

Questions and Answers

You need to be logged in to be able to post here.