Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.


The knowledge gradient is a policy for efficiently learning the best of a set of choices by maximizing the marginal value of information, a form of steepest ascent for a belief model. The policy has no tunable parameters, and has been adapted to both online (bandit) and offline (ranking and selection) problems. Versions of the knowledge gradient have been derived for both discrete and continuous alternatives, and a variety of Bayesian belief models including lookup tables with correlated beliefs, parametric models and two forms of nonparametric models. We have also derived the knowledge gradient for a robust (min-max) objective. We report on comparisons against popular policies such as Boltzmann, epsilon-greedy, interval estimation and UCB policies. We will discuss applications in drug discovery (a problem with 90,000 alternatives) and the tuning of parameters for black-box simulations (vector-valued continuous parameters). Additional information may be found at

Questions and Answers

You need to be logged in to be able to post here.