Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.
The Dueling Bandits Problem is an online learning framework in which actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). It models settings where absolute rewards are difficult to elicit but pairwise preferences are readily available. In this paper, we extend the Dueling Bandits Problem to a relaxed setting where preference magnitudes can violate transitivity. We present the first algorithm for this more general Dueling Bandits Problem and provide theoretical guarantees in both the online and the PAC settings. Furthermore, we show that the new algorithm has stronger guarantees than existing results even in the original Dueling Bandits Problem, which we validate empirically.
Questions and AnswersYou need to be logged in to be able to post here.