-
Upload Video
videos in mp4/mov/flv
close
Upload video
Note: publisher must agree to add uploaded document -
Upload Slides
slides or other attachment
close
Upload Slides
Note: publisher must agree to add uploaded document -
Feedback
help us improve
close
Feedback
Please help us improve your experience by sending us a comment, question or concern
Click in the text-area below, and then press Enter key to start playing the video. You will be asked to press Enter again to pause the video and type-in your transcript.
{{current_subtitle}}
- [{{time_string(subtitle.start_time)}} - {{time_string(subtitle.end_time)}}] {{subtitle.text}}
Description
This tutorial intends to be an introduction to stochastic and adversarial multi-armed bandit algorithms and to survey some of the recent advances. In the multi-armed bandit problem, at each stage, an agent (or decision maker) chooses one action (or arm), and receives a reward from it. The agent aims at maximizing his rewards. Since he does not know the process generating the rewards, he needs to explore (try) the different actions and yet, exploit (concentrate its draws on) the seemingly most rewarding arms.
The bandit problem has been increasingly popular in the machine learning community. It is the simplest setting where one encounters the exploration-exploitation dilemma. It has a wide range of applications including advertizement [1, 6], economics [2, 12], games [7] and optimization [10, 5, 9, 3], model selection and machine learning algorithms itself [13, 4]. It can be a central building block of larger systems, like in evolutionary programming [8] and reinforcement learning [14], in particular in large state space Markovian Decision Problems [11].
Questions and Answers
You need to be logged in to be able to post here.-
-
-
Q:Posted by: | May 7, 2016, 7:34 a.m. | 0 likes
The video doesn't seem to be working. Can it be fixed?
-